(B)
FIGURE 6 The best performing model normalized confusion
matrices for the classification to pre-defined age groups for the tasks
A vs. B vs. C vs. D (chart A) and A vs. D (chart B).
3.2 | Classification with variable age group thresholds
Table 3 shows the results of the age threshold tests for each
classifier. The performance in the age threshold test 1, classification
of the subjects to young and old groups using different age thresholds,
achieved moderate to good performance (bacc: 63.44 – 71.05 %). Three
of the classifiers found similar best age threshold (39 – 41), while
the overall best performing model of linear SVM performed best with an
age threshold of 56. However, it is noteworthy that there is a clear
uneven distribution of subjects when using the suggested age threshold
of 56 (N_young: 36, N_old: 14). The class imbalance is similar to one
observed in classification to pre-defined age groups but pronounced more
strongly. On the other hand, it can be observed that the second-best
performing classifier of Gaussian SVM reached the highest performance
with an age threshold of 42, where the achieved performance can be
considered moderate with more balanced classes (N_young: 27, N_old:
23). The found best age threshold of 56 matches the threshold used for
group D when using pre-defined age groups. It is visible from Figure 6 A
that in the most cases subjects B and C are classified to group A. The
result indicates that with the current dataset the differences in the
study population seem to be most pronounced with the threshold of 56.
Interestingly, the threshold was in agreement with the class
distribution in Figure 3A.
In the age threshold test 2, where two age thresholds were selected, and
the in-between subjects were excluded from the ML loop, the results show
good to excellent performance with all the classifiers (bacc: 79.09 –
81.50 %). The lower threshold varies between 27 to 37, while the upper
threshold was between 57 to 62. The performance of the age threshold
test 3 with rejection class can be considered good with all the
classifiers (micro averaged bacc: 72 – 74.68 %). Polynomial and
Gaussian SVM, and random forest classifiers performed best with the same
age thresholds (44 – 62), while linear SVM used different range for the
rejection class (27 – 57). As visible in Figure 7 confusion matrix of
the best performing model polynomial SVM, most of the predictions were
done in favour of the biggest group of the young subjects, as noticed as
well in the previous tests. The closer examination of class sizes showed
that the biggest class had approximately three times more subjects in
comparison to the other two groups (N_young: 30, N_rejection: 9 ,
N_old: 11 ). The uneven class distribution can result in case where
when the classifier is in doubt it will favour the majority class. The
more balanced case was found with linear SVM model with similar
performance as seen in Figure 7 (D), where the age thresholds for
rejection class of 27 – 57 yielded more balanced class distribution
(N_young: 14, N_old: 12, N_rejection: 24), although still favouring
one class over the others in the predictions.
The results of the age threshold test 2 and 3 present two interesting
observations. Firstly, the classification performance is higher when the
subjects in-between the young and old age group are excluded from the ML
loop. Secondly, the upper age threshold is fairly constant (57, 62),
while the lower age threshold has more variation (27, 33, 37, 44). The
variation in the age threshold between the classifiers is expected, as
the different classifiers perform differently in the tasks. The between
tests variation with younger threshold with same classifiers could be
explained by the effect caused by homogeneity in the data of the younger
subjects, which is further supported by the results of the
classification task using pre-defined age groups. These findings suggest
a difference in the rate of aging between the subjects, and that the
age-related differences are more distinctly pronounced as the aging
process has progressed further.
TABLE 3 The performance of the classifiers in the age threshold
test. The results consist of balanced accuracy of K-fold (K=5)
cross-validation with the selected age thresholds. The results of the
best performing models are bolded.