2.2.2 Gaussian Process Regression (GPR) model calibration and
validation
GPR establishes the relationship between the input features\(x\in\mathbb{R}^{B}\) in the number of \(B\) and the output variables
(leaf traits) \(y\mathbb{\in R}\) via the kernel function \(k\), which
defines the relationship between the pair of data points. The output
variable values \(\mathbf{Y}\) and \(\mathbf{Y}_{*}\) of all training
(\(\mathbf{x}\)) and testing (\(\mathbf{x}_{*}\)) data points are
considered to be from a joint multivariate normal distribution
(Rasmussen and Williams, 2006):
\(\par
\begin{pmatrix}\mathbf{Y}\\
\mathbf{Y}_{*}\\
\end{pmatrix}\mathcal{\sim N}\left(0,\par
\begin{bmatrix}k\left(\mathbf{x},\mathbf{x}\right)+\sigma_{0}^{2}\mathbf{I}&k\left(\mathbf{x},\mathbf{x}_{\mathbf{*}}\right)\\
k\left(\mathbf{x}_{\mathbf{*}},\mathbf{x}\right)&k\left(\mathbf{x}_{\mathbf{*}},\mathbf{x}_{\mathbf{*}}\right)\\
\end{bmatrix}\right)\) (1)
where \(k\left(\mathbf{x},\mathbf{x}_{\mathbf{*}}\right)\) denotes the
matrix of the covariances evaluated at all pairs of training and testing
data points; the same applies to other entries of\(k\left(\mathbf{x},\mathbf{x}\right)\),\(k\left(\mathbf{x}_{\mathbf{*}},\mathbf{x}\right)\) and\(k\left(\mathbf{x}_{\mathbf{*}},\mathbf{x}_{\mathbf{*}}\right)\). The
observed output variables are assumed with i.i.d. Gaussian noise
(\(\mathcal{N}\left(0,\sigma_{0}^{2}\right)\)) and \(\mathbf{I}\)represents the identity matrix.
The posterior distribution of \(\mathbf{Y}_{*}\) is estimated following
(Rasmussen and Williams, 2006):
\(\mathbf{Y}_{*}|\mathbf{Y},\mathbf{x};\mathbf{x}_{*}\mathcal{\sim N}\left(y_{*,\mu},y_{*,var}\right)\)(2)
where the predicted posterior mean \(y_{*,\mu}\) and variance\(y_{*,var}\) are calculated
as\(\ k\left(\mathbf{x}_{\mathbf{*}},\mathbf{x}\right)\left[k\left(\mathbf{x},\mathbf{x}\right)+\ \sigma_{0}^{2}\mathbf{I}\right]^{-1}\mathbf{Y}\)and\(k\left(\mathbf{x}_{\mathbf{*}},\mathbf{x}_{\mathbf{*}}\right)-k\left(\mathbf{x}_{\mathbf{*}},\mathbf{x}\right)\left[k\left(\mathbf{x},\mathbf{x}\right)+\ \sigma_{0}^{2}\mathbf{I}\right]^{-1}k\left(\mathbf{x},\mathbf{x}_{\mathbf{*}}\right)\),
respectively.
The commonly used anisotropic squared exponential kernel function is
also adopted here (Verrelst et al., 2013a):
\(k\left(x_{i},x_{j}\right)=\nu\ \exp\left(-\sum_{b=1}^{B}\frac{{(x_{i}^{(b)}-x_{j}^{(b)})}^{2}}{2\sigma_{b}^{2}}\right)+\sigma_{0}^{2}\delta_{\text{ij}}\)(3)
where \(\nu\) is a scaling factor, \(\sigma_{b}\) is the length-scale
per input feature \(b\), controlling the spread of the relations for
each input feature, and \(\delta_{\text{ij}}\) is the Kronecker’s
symbol. The hyperparameters, which are denoted as\(\theta_{k}=\left\{\nu,\sigma_{b},\sigma_{0}\right\}\), are
determined by maximizing the log likelihood in the training set
(Rasmussen and Williams, 2006):
\(\mathcal{l}\left(\mathbf{Y}|\mathbf{x},\theta_{k}\right)=-\frac{n}{2}\ln\left(2\pi\right)-\frac{1}{2}\ln\left|k\left(\mathbf{x},\mathbf{x}\right)+\sigma_{0}^{2}\mathbf{I}\right|-\frac{1}{2}\mathbf{Y}^{T}\left(k\left(\mathbf{x},\mathbf{x}\right)+\sigma_{0}^{2}\mathbf{I}\right)^{-1}\mathbf{Y}\)(4)
where \(n\) represents the size of training dataset.
For calibrating and validating the GPR model, the acquired complete
dataset was split into a training (75%) and a testing (25%) dataset.
To avoid local maxima, the values of the hyperparameters in the GPR
model were averaged from 100 iterations, and in each run, two-thirds of
the training data were randomly selected from the whole training dataset
(Verrelst et al., 2013a; Wang et al., 2019).