Xiang Zhang

and 4 more

Vital sign (breathing and heartbeat) monitoring is essential for patient care and sleep disease prevention. Most current solutions are based on wearable sensors or cameras; however, the former could affect sleep quality, while the latter often present privacy concerns. To address these shortcomings, we propose Wital, a contactless vital sign monitoring system based on low-cost and widespread commercial off-the-shelf (COTS) Wi-Fi devices. There are two challenges that need to be overcome. First, the torso deformations caused by breathing/heartbeats are weak. How can such deformations be effectively captured? Second, movements such as turning over affect the accuracy of vital sign monitoring. How can such detrimental effects be avoided? For the former, we propose a non-line-of-sight (NLOS) sensing model for modeling the relationship between the energy ratio of line-of-sight (LOS) to NLOS signals and the vital sign monitoring capability using Ricean K theory and use this model to guide the system construction to better capture the deformations caused by breathing/heartbeats. For the latter, we propose a motion segmentation method based on motion regularity detection that accurately distinguishes respiration from other motions, and we remove periods that include movements such as turning over to eliminate detrimental effects. We have implemented and validated Wital on low-cost COTS devices. The experimental results demonstrate the effectiveness of Wital in monitoring vital signs.
Facial expression recognition (FER) is a challenging job in Computer Vision due to data uncertainties rooted in the ambiguity of facial expressions. As a complement to current FER studies huddling in data-level or feature-level for suppressing such uncertainties,  we propose a simple yet efficient coarse-to-fine learning strategy at task-level inspired by human beings’ emotion cognitive mode. Specifically, a child learns quickly whether his behavior is allowed by reading adults’ facial expressions for coarse attitude like positive or negative, and then adjusts accordingly via further interpreting its fine sentiment meaning like happy or angry. Similarly, we follow the same divide-and-conquer paradigm and decompose FER into two correlated easier sub-tasks:  i) coarse classification for attitude analysis and ii) fine recognition for sentiment interpretation, and build a multi-branch deep network (CFNet) to tackle them in a joint manner. The key idea is to aggregate the discrete universal facial expressions into several coarse groups reflecting attitude tendency based on their empirical projections in the continuous Valence-Arousal (VA) emotion space. However, the coarse classification sub-task is inherently tougher due to its significantly larger intra-class variations compared to the fine recognition sub-task. Such discord could lead to immature learning and degrade the overall performance. To overcome this issue, CFNet leverages a synchronization mechanism to control the learning process via knowledge sharing between both sub-tasks. In addition, a novel center loss is introduced to enhance the discriminative power of the network in extracting compact intra-class representations while preserving intrinsic inter-class relationships. Experiments on three benchmark datasets show that our method achieves state-of-the-art performance, which demonstrates its superiority. The code is available at https://github.com/codpub/CFNet.