Figure 4. REAL captures audio from the human throat. a) REAL captures audio from the speaker's throat in a noisy environment. b) STFT spectrogram of two audios recorded with microphone (on REAL) and REAL respectively. c) Block diagram of training and inference pipeline of the neural network model to recover REAL audio. d) Spectrogram of the speaker microphone signal (ground truth) and the recovered audio from the model. e) The audio quality metric SDR of the testing dataset and the sentence (‘He came to the point’) with corresponding spectrograms at different training iterations. f) Metrics histograms demonstrating improved audio quality from the original audios to recovered audios.