* Respiratory bandwidth, ** Relative concentration of the chromophores, *** dPTE features computed for two directions, e.g., HbO → Water and Water → HbO.
2.3 | Machine learning methods
Random forest ensemble and support vector machines (SVM) with linear, polynomial and gaussian kernels were selected as learning algorithms. The feature selection for SVM learners was implemented by using minimum redundancy maximum relevance (MRMR) algorithm by MATLAB’s fscmrmr function. The hyperparameter optimization was conducted by using Bayesian optimization with MATLAB’s function bayesopt with 60 iterations.
Due to relatively small sample size used in the study, each test utilized nested cross-validation (CV) to provide robust and unbiased performance estimate [35]. The nested CV protocol is illustrated in Figure 2. The outer loop consisted of stratified Monte Carlo cross-validation (MCCV) or K-fold CV depending on the test, while the inner loop used leave-one-out cross-validation (LOOCV). The inner loop contained hyperparameter tuning for each model, including hyperparameters for the feature selection, which in the case of MRMR was number of selected features based on MRMR feature ranking. For all SVM learners box constraint was tuned with default search space, while for polynomial SVM the polynomial order was tuned using search space of [2,3], and for gaussian SVM the kernel scale was tuned using default search space. For random forest, the tuned hyperparameters were minimum leaf size and number of predictors to sample with default search space, and number of trees with [5, 1000] search space. The overall CV performance was recorded by using balanced accuracy (bacc). When analysing the results, the bacc is evaluated either as poor (bacc ≤ 60 %), moderate (< 60 % < bacc ≤ 70 %), good (70 % < bacc ≤ 80 %) or excellent (bacc > 80 %).
2.3.1 | Classification to age groups with pre-defined age groups
The first classification test was conducted by dividing the data into four pre-defined age groups. The age thresholds for each group were selected by aiming for approximately equal sized groups, while still containing wide age distribution. The age group division is illustrated in Figure 3. Stratified MCCV with 20-80 % test-train-split and 1000 iterations was used for the outer CV.