K. M. NAIMUL HASSAN - 21DOCS Test Area

Sound event detection (SED) in the medical environment can be helpful in accomplishing different healthcare tasks. Due to the success of transformer encoder architectures for sound event detection, they seem to be a promising choice for detecting audio events in hospital settings. However, there are two main difficulties in detecting medical audio events with transformers. Firstly, the availability of medical audio data is extremely limited, making it difficult to effectively train a transformer model. Secondly, it is necessary for the SED model to be computationally efficient in order to be deployed in medical environments with limited resources. But, the transformer has high computational complexity because of the attention mechanism. To address these challenges, this paper introduces the Audio Spectrogram Fourier Network (ASFNet), a novel attention-free transformer encoder designed specifically for sound event detection in the medical environment. ASFNet replaces the attention operation with a simplified Fast Fourier Transform. By leveraging this approach, ASFNet outperforms the other methods, achieving a superior average mAP of 0.474 with a 16.76% relative improvement. ASFNet achieves this performance with fewer model parameters and smaller model size, making it an efficient and effective solution for medical audio event detection.