Estimating 3D hand poses from single RGB images is essential in real applications. However, the random use of single and two hands, the similarity and occlusion between hands, and the lack of depth clues make it challenging. To address this issue, we propose a framework for two-hand instance segmentation and pose estimation based on attention-induced separation. The framework first extracts hand joint heatmaps from images, which are then used as spatial attention to fuse with the input image along the channel dimension to implement hand instance segmentation. Subsequently, hand joint heatmaps and hand masks are combined to provide denser spatial attention and fuse with the input image along the channel dimension again for hand separation. Finally, this five-channel image is used for single-hand pose estimation. We extend the canonical 21-joint hand model to a 128-joint one to provide more effective hand-joint heatmap attention. Moreover, we utilize prior knowledge implied in the hand skeleton to help generate biomechanically feasible hand poses. Experimental results indicate that our framework outperforms state-of-the-art methods in the generalization ability of single-and two-hand pose estimation.