Melisa Saygin

and 3 more

Speech production interferes with accurate measurement of cardiac vagal activity during acute stress, attenuating the expected drop in heart rate variability in the respiratory frequency band. Speech also induces sympathetic changes similar to those induced by psychological stress. In the laboratory, confounding of physiological stress reactivity by speech may be controlled experimentally. In ambulatory assessments, however, detection of speech episodes is necessary to separate the physiological effects of psychosocial stress from those of speech. Using supervised machine learning (https://osf.io/bk9nf), we trained and tested speech classification models on data obtained from 56 participants (ages 18-39). They were equipped with privacy-secure wearables measuring thoracoabdominal respiratory inductance plethysmography (RIP from a single and a dual-band set-up), thoracic impedance pneumography, and an upper-sternum positioned unit with triaxial accelerometers and gyroscopes. Following an 80/20 train-test set split, nested cross-validations were run with the machine learning algorithms XGBoost, Gradient Boosting, Random Forest, and Logistic Regression on the training set to get unbiased generalized performance estimates. Speech classification by the best model per method was then validated in the unseen test-set. Speech versus no-speech classification performance (AUC) for both nested cross-validation and test-set predictions was excellent for thorax-abdomen RIP (nested cross-validation: 96.6%, test-set prediction: 98.5%), thorax-only RIP (97.5%, 99.1%), impedance (97.0%, 97.8%) and accelerometry (99.3%, 99.6%). The sternal accelerometer outperformed the other methods. These novel open-access models that leverage privacy-secure biosignals will enable researchers to detect speech and control for its confounding effects in ambulatory recordings, thereby enhancing the trustworthiness of psychophysiological findings.