(B)
FIGURE 6 The best performing model normalized confusion matrices for the classification to pre-defined age groups for the tasks A vs. B vs. C vs. D (chart A) and A vs. D (chart B).
3.2 | Classification with variable age group thresholds
Table 3 shows the results of the age threshold tests for each classifier. The performance in the age threshold test 1, classification of the subjects to young and old groups using different age thresholds, achieved moderate to good performance (bacc: 63.44 – 71.05 %). Three of the classifiers found similar best age threshold (39 – 41), while the overall best performing model of linear SVM performed best with an age threshold of 56. However, it is noteworthy that there is a clear uneven distribution of subjects when using the suggested age threshold of 56 (N_young: 36, N_old: 14). The class imbalance is similar to one observed in classification to pre-defined age groups but pronounced more strongly. On the other hand, it can be observed that the second-best performing classifier of Gaussian SVM reached the highest performance with an age threshold of 42, where the achieved performance can be considered moderate with more balanced classes (N_young: 27, N_old: 23). The found best age threshold of 56 matches the threshold used for group D when using pre-defined age groups. It is visible from Figure 6 A that in the most cases subjects B and C are classified to group A. The result indicates that with the current dataset the differences in the study population seem to be most pronounced with the threshold of 56. Interestingly, the threshold was in agreement with the class distribution in Figure 3A.
In the age threshold test 2, where two age thresholds were selected, and the in-between subjects were excluded from the ML loop, the results show good to excellent performance with all the classifiers (bacc: 79.09 – 81.50 %). The lower threshold varies between 27 to 37, while the upper threshold was between 57 to 62. The performance of the age threshold test 3 with rejection class can be considered good with all the classifiers (micro averaged bacc: 72 – 74.68 %). Polynomial and Gaussian SVM, and random forest classifiers performed best with the same age thresholds (44 – 62), while linear SVM used different range for the rejection class (27 – 57). As visible in Figure 7 confusion matrix of the best performing model polynomial SVM, most of the predictions were done in favour of the biggest group of the young subjects, as noticed as well in the previous tests. The closer examination of class sizes showed that the biggest class had approximately three times more subjects in comparison to the other two groups (N_young: 30, N_rejection: 9 , N_old: 11 ). The uneven class distribution can result in case where when the classifier is in doubt it will favour the majority class. The more balanced case was found with linear SVM model with similar performance as seen in Figure 7 (D), where the age thresholds for rejection class of 27 – 57 yielded more balanced class distribution (N_young: 14, N_old: 12, N_rejection: 24), although still favouring one class over the others in the predictions.
The results of the age threshold test 2 and 3 present two interesting observations. Firstly, the classification performance is higher when the subjects in-between the young and old age group are excluded from the ML loop. Secondly, the upper age threshold is fairly constant (57, 62), while the lower age threshold has more variation (27, 33, 37, 44). The variation in the age threshold between the classifiers is expected, as the different classifiers perform differently in the tasks. The between tests variation with younger threshold with same classifiers could be explained by the effect caused by homogeneity in the data of the younger subjects, which is further supported by the results of the classification task using pre-defined age groups. These findings suggest a difference in the rate of aging between the subjects, and that the age-related differences are more distinctly pronounced as the aging process has progressed further.
TABLE 3 The performance of the classifiers in the age threshold test. The results consist of balanced accuracy of K-fold (K=5) cross-validation with the selected age thresholds. The results of the best performing models are bolded.