Alin Navas

and 2 more

Aims: While several deep learning architectures with comparable performance are available in the literature for predicting cardiac conditions from ECG signals, these models differ in computational cost and potentially in their focus on clinically relevant features. This study aims to utilize various explainability methods to identify which model aligns most closely with clinical logic and determine if some models are more focused on relevant clinical features for different pathological beats. Methods and Results: The analysis was performed on the freely available MIT-BIH dataset. Data was preprocessed before training three models using machine learning algorithms with diverse architectures, MLP, CNN, and LSTM. Important segments for each pathological class were identified using four explainability methods, PFI, SHAP, LIME, and Grad-CAM. The results were evaluated against the rulebased approach derived from clinical texts. Conclusions: The study found that PFI, despite its simplicity, best reflected clinical performance. The results imply that global approaches could be more effective than aggregating local results in understanding feature importance. However, PFI failed to generate feature importance for a class with a relatively small sample size, highlighting issues for deployment in datasets with unbalanced classes. Surprisingly, the MLP model outperformed LSTM and CNN in identifying clinically relevant segments; this finding might be related explicitly to single-beat classification.