Abstract
Virtual Reality (VR) and Augmented Reality (AR) applications are
becoming increasingly prevalent. However, constructing realistic 3D
hands, especially when two hands are interacting, from a single RGB
image remains a major challenge due to severe mutual occlusion and the
enormous diversity of hand poses. In this paper, we propose a Disturbing
Graph Contrastive Learning strategy for two-hand 3D reconstruction. This
involves a graph disturbance network designed to generate graph feature
pairs to enhance the consistency of the two-hand pose features. A
contrastive learning module leverages high-quality generative features
for a strong feature expression. We further propose a similarity
distinguish method to divide positive and negative features for
accelerating the model convergence. Additionally, a multi-term loss is
designed to balance the relation among the hand pose, the visual scale
and the viewpoint position. Our model has achieved State-of-the-Art
results in the InterHand2.6M benchmark. Ablation studies show the
model’s great ability to correct unreasonable hand movements. In
subjective assessments, our Graph Disturbance Learning method
significantly improves the construction of realistic 3D hands,
especially when two hands are interacting.