2.2.2 Gaussian Process Regression (GPR) model calibration and validation
GPR establishes the relationship between the input features\(x\in\mathbb{R}^{B}\) in the number of \(B\) and the output variables (leaf traits) \(y\mathbb{\in R}\) via the kernel function \(k\), which defines the relationship between the pair of data points. The output variable values \(\mathbf{Y}\) and \(\mathbf{Y}_{*}\) of all training (\(\mathbf{x}\)) and testing (\(\mathbf{x}_{*}\)) data points are considered to be from a joint multivariate normal distribution (Rasmussen and Williams, 2006):
\(\par \begin{pmatrix}\mathbf{Y}\\ \mathbf{Y}_{*}\\ \end{pmatrix}\mathcal{\sim N}\left(0,\par \begin{bmatrix}k\left(\mathbf{x},\mathbf{x}\right)+\sigma_{0}^{2}\mathbf{I}&k\left(\mathbf{x},\mathbf{x}_{\mathbf{*}}\right)\\ k\left(\mathbf{x}_{\mathbf{*}},\mathbf{x}\right)&k\left(\mathbf{x}_{\mathbf{*}},\mathbf{x}_{\mathbf{*}}\right)\\ \end{bmatrix}\right)\) (1)
where \(k\left(\mathbf{x},\mathbf{x}_{\mathbf{*}}\right)\) denotes the matrix of the covariances evaluated at all pairs of training and testing data points; the same applies to other entries of\(k\left(\mathbf{x},\mathbf{x}\right)\),\(k\left(\mathbf{x}_{\mathbf{*}},\mathbf{x}\right)\) and\(k\left(\mathbf{x}_{\mathbf{*}},\mathbf{x}_{\mathbf{*}}\right)\). The observed output variables are assumed with i.i.d. Gaussian noise (\(\mathcal{N}\left(0,\sigma_{0}^{2}\right)\)) and \(\mathbf{I}\)represents the identity matrix.
The posterior distribution of \(\mathbf{Y}_{*}\) is estimated following (Rasmussen and Williams, 2006):
\(\mathbf{Y}_{*}|\mathbf{Y},\mathbf{x};\mathbf{x}_{*}\mathcal{\sim N}\left(y_{*,\mu},y_{*,var}\right)\)(2)
where the predicted posterior mean \(y_{*,\mu}\) and variance\(y_{*,var}\) are calculated as\(\ k\left(\mathbf{x}_{\mathbf{*}},\mathbf{x}\right)\left[k\left(\mathbf{x},\mathbf{x}\right)+\ \sigma_{0}^{2}\mathbf{I}\right]^{-1}\mathbf{Y}\)and\(k\left(\mathbf{x}_{\mathbf{*}},\mathbf{x}_{\mathbf{*}}\right)-k\left(\mathbf{x}_{\mathbf{*}},\mathbf{x}\right)\left[k\left(\mathbf{x},\mathbf{x}\right)+\ \sigma_{0}^{2}\mathbf{I}\right]^{-1}k\left(\mathbf{x},\mathbf{x}_{\mathbf{*}}\right)\), respectively.
The commonly used anisotropic squared exponential kernel function is also adopted here (Verrelst et al., 2013a):
\(k\left(x_{i},x_{j}\right)=\nu\ \exp\left(-\sum_{b=1}^{B}\frac{{(x_{i}^{(b)}-x_{j}^{(b)})}^{2}}{2\sigma_{b}^{2}}\right)+\sigma_{0}^{2}\delta_{\text{ij}}\)(3)
where \(\nu\) is a scaling factor, \(\sigma_{b}\) is the length-scale per input feature \(b\), controlling the spread of the relations for each input feature, and \(\delta_{\text{ij}}\) is the Kronecker’s symbol. The hyperparameters, which are denoted as\(\theta_{k}=\left\{\nu,\sigma_{b},\sigma_{0}\right\}\), are determined by maximizing the log likelihood in the training set (Rasmussen and Williams, 2006):
\(\mathcal{l}\left(\mathbf{Y}|\mathbf{x},\theta_{k}\right)=-\frac{n}{2}\ln\left(2\pi\right)-\frac{1}{2}\ln\left|k\left(\mathbf{x},\mathbf{x}\right)+\sigma_{0}^{2}\mathbf{I}\right|-\frac{1}{2}\mathbf{Y}^{T}\left(k\left(\mathbf{x},\mathbf{x}\right)+\sigma_{0}^{2}\mathbf{I}\right)^{-1}\mathbf{Y}\)(4)
where \(n\) represents the size of training dataset.
For calibrating and validating the GPR model, the acquired complete dataset was split into a training (75%) and a testing (25%) dataset. To avoid local maxima, the values of the hyperparameters in the GPR model were averaged from 100 iterations, and in each run, two-thirds of the training data were randomly selected from the whole training dataset (Verrelst et al., 2013a; Wang et al., 2019).