Abstract
Machine learning-based code smell detection has been demonstrated to be
a valuable approach for improving software quality and enabling
developers to identify problematic patterns in code. However, previous
researches have shown that the code smell datasets commonly used to
train these models are heavily imbalanced. While some recent studies
have explored the use of imbalanced learning techniques for code smell
detection, they have only evaluated a limited number of techniques and
thus their conclusions about the most effective methods may be biased
and inconclusive. To thoroughly evaluate the effect of imbalanced
learning techniques on machine learning-based code smell detection, we
examine 31 imbalanced learning techniques with seven classifiers to
build code smell detection models on four code smell data sets. We
employ four evaluation metrics to assess the detection performance with
the Wilcoxon signed-rank test and Cliff’s δ. The results show
that (1) Not all imbalanced learning techniques significantly improve
detection performance, but deep forest significantly outperforms the
other techniques on all code smell data sets. (2) SMOTE (Synthetic
Minority Over-sampling TEchnique) is not the most effective technique
for resampling code smell data sets. (3) The best-performing imbalanced
learning techniques and the top-3 data resampling techniques have little
time cost for code smell detection. Therefore, we provide some practical
guidelines. First, researchers and practitioners should select the
appropriate imbalanced learning techniques (e.g., deep forest) to
ameliorate the class imbalance problem. In contrast, the blind
application of imbalanced learning techniques could be harmful. Then,
better data resampling techniques than SMOTE should be selected to
preprocess the code smell data sets.