* Respiratory bandwidth, ** Relative concentration of the chromophores,
*** dPTE features computed for two directions, e.g., HbO → Water and
Water → HbO.
2.3 | Machine learning methods
Random forest ensemble and support vector machines (SVM) with linear,
polynomial and gaussian kernels were selected as learning algorithms.
The feature selection for SVM learners was implemented by using minimum
redundancy maximum relevance (MRMR) algorithm by MATLAB’s fscmrmr
function. The hyperparameter optimization was conducted by using
Bayesian optimization with MATLAB’s function bayesopt with 60
iterations.
Due to relatively small sample size used in the study, each test
utilized nested cross-validation (CV) to provide robust and unbiased
performance estimate [35]. The nested CV protocol is illustrated in
Figure 2. The outer loop consisted of stratified Monte Carlo
cross-validation (MCCV) or K-fold CV depending on the test, while the
inner loop used leave-one-out cross-validation (LOOCV). The inner loop
contained hyperparameter tuning for each model, including
hyperparameters for the feature selection, which in the case of MRMR was
number of selected features based on MRMR feature ranking. For all SVM
learners box constraint was tuned with default search space, while for
polynomial SVM the polynomial order was tuned using search space of
[2,3], and for gaussian SVM the kernel scale was tuned using default
search space. For random forest, the tuned hyperparameters were minimum
leaf size and number of predictors to sample with default search space,
and number of trees with [5, 1000] search space. The overall CV
performance was recorded by using balanced accuracy (bacc). When
analysing the results, the bacc is evaluated either as poor (bacc ≤ 60
%), moderate (< 60 % < bacc ≤ 70 %), good (70 %
< bacc ≤ 80 %) or excellent (bacc > 80 %).
2.3.1 | Classification to age groups with pre-defined age
groups
The first classification test was conducted by dividing the data into
four pre-defined age groups. The age thresholds for each group were
selected by aiming for approximately equal sized groups, while still
containing wide age distribution. The age group division is illustrated
in Figure 3. Stratified MCCV with 20-80 % test-train-split and 1000
iterations was used for the outer CV.