Mahlatse Kganyago

and 2 more

Machine learning regression algorithms (MLRAs) can learn complex and non-linear relationships between the response and predictor variables. However, studies have shown that feature subset selection is more beneficial, yielding high accuracy and low uncertainties in retrieving biophysical and biochemical variables. Generally, feature subset selection techniques are often applied with highly dimensional and correlated hyperspectral data, while it is seldom used with the multispectral dataset. Instead, previous studies utilising multispectral data have mainly applied the entire feature space. The advent of quasi-hyperspectral sensors, e.g., Sentinel-2, presents new challenges where two or more variables may be collinear and impact MLRA’s performance. This study presents a novel Spectral Triad feature selection technique based on music theory and compares it to the entire MSI feature space and Random Forest-Recursive Feature Elimination (RF-RFE). The optimal subsets were evaluated with Random Forest for retrieving leaf area index (LAI), Leaf Chlorophyll a + b (LCab) and Canopy Chlorophyll Content (CCC) in a semi-arid agricultural landscape. The results indicated that the proposed STfs algorithm obtained equivalent or better (i.e., by 1 – 3%) retrieval results for LAI (R2cv of 66%, RMSEcv of 0.53 m2 m–2), LCab (R2cv: 74%, RMSEcv: 7.09 µg cm–2) and CCC (R2cv: 77%, RMSEcv: 33.69 µg cm–2), using only 5, 7 and 7 variables, respectively, when compared to RF-RFE and entire MSI feature space. Overall, the proposed STfs algorithm has great potential to optimise the spectral feature space of quasi-hyperspectral sensors for rapid crop biophysical and biochemical parameter retrieval.Peer-reviewed version of the article can be accessed here: https://doi.org/10.1080/10106049.2024.2309174