Blood oxygen saturation (SpO2) is an essential indicator of respiratory functionality and is receiving increasing attention during the COVID-19 pandemic. Clinical findings show that it is possible for COVID-19 patients to have significantly low SpO2 before any obvious symptoms. The prevalence of cameras has motivated researchers to investigate methods for monitoring SpO2 using videos. Most prior schemes involving smartphones are contact-based: They require a fingertip to cover the phone’s camera and the nearby light source to capture re-emitted light from the illuminated tissue. In this paper, we propose the first convolutional neural network based noncontact SpO2 estimation scheme using smartphone cameras. The scheme analyzes the videos of a participant’s hand for physiological sensing, which is convenient and comfortable, and can protect their privacy and allow for keeping face masks on. We design our neural network architectures inspired by the optophysiological models for SpO2 measurement and demonstrate the explainability by visualizing the weights for channel combination. Our proposed models outperform the state-of-the-art model that is designed for contact-based SpO2 measurement, showing the potential of our proposed method to contribute to public health. We also analyze the impact of skin type and the side of a hand on SpO2 estimation performance.