Imaging photoplethysmography (iPPG) is a contactless approach for the extraction of the blood volume pulsation (BVP). Analyzing the small intensity changes resulting from fluctuations in light absorption in upper skin layers enables BVP extraction. Local and global dynamic effects, e.g. caused by shadows from inhomogeneous illumination or head movements, impede iPPG-based BVP extraction. To eliminate these effects, an important step is the accurate skin segmentation and weighting, which has received insufficient attention in state-of-the-art (SOFA) deep learning-based approaches in particular. Therefore, we propose DeepPerfusion, a two-branched deep learning architecture, that combines skin segmentation and BVP extraction into one model and thus is more capable of taking into account the aforementioned effects compared to SOFA approaches. We evaluated the mean absolute error (MAE) for heart rate extraction and the signal-to-noise ratio (SNR) on 156 subjects from three publicly available datasets and compared it with nine SOFA approaches that underwent the same training and evaluation pipeline. For the median over the subjects of each dataset, DeepPerfusion consistently achieved MAE below 1 beat per minute and thus significantly outperformed all SOFA approaches by up to 49 %. Furthermore, DeepPerfusion achieved high SNR with at least 5.81 dB which was about two to three times higher compared to the best SOFA approaches. In contrast to SOFA approaches, DeepPerfusion's performance was consistent, robust and highly precise. This demonstrates DeepPerfusion's ability to perform high-precision BVP extraction, which will open up new diagnostic applications for iPPG in the future. Note: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.