1. Introduction
Accurate forecasting and mapping of spatiotemporal variabilities in
aboveground biomass (W above) and grain yield
during the growing season are essential for informing farmers to perform
field precision management under variable climatic conditions (Acevedo
et al., 2020; Gao et al., 2017; Lobell and Azzari, 2017). Remote sensing
science serves these purposes via connecting field measurements with
sensor observations. For instance, crop yield forecasting based on
regression models using field-measured yield and remote sensing features
dates back to the 1970s (Idso et al., 1977). However, the poor
exploration of model and data uncertainties has increasingly become a
limitation for most remote sensing observations at high spatial
resolution (Martínez-Ferrer et al., 2022). Although information like
leaf area index (LAI, m2 m-2) and
some weather variables can be incorporated into those regression methods
to improve predictions (Johnson, 2014), the interactions within the
continuum of atmosphere-crop-soil are widely overlooked. In this regard,
it may be useful to explore dynamic crop models that have been developed
since the 1960s to simulate crop growth and yield (e.g., de Wit (1965);
de Wit and Penning de Vries (1985); Jones et al. (2003); Keating et al.
(2003); Yin and van Laar (2005)) given that these models are based on
in-depth understanding of crop physiological principles. However, as
such crop models are generally developed and tested at the scale of a
homogeneous plot, uncertainties are inevitable when applying them to
heterogeneous farmers’ fields. Uncertainties are also caused by
incomplete knowledge of physiological processes, parameter values,
meteorological conditions, soil properties and management practices
(Hansen and Jones, 2000). For predictions using either remote sensing or
crop modelling, all these uncertainties propagate, leading to bias in
simulated in-season crop growth and end-of-season crop yield.
Combined utilization of crop model simulations and remotely sensed
observations is expected to produce a more accurate estimate than any of
the two approaches on its own and attracts ever-increasing interests in
smart farming (Houser et al., 2012; Jin et al., 2018). To this end, data
assimilation (DA) methods have been developed (Jin et al., 2018). Monte
Carlo-based Ensemble Kalman Filter (EnKF) (Evensen, 1994) is among the
most popular methods for conducting DA (Carrassi et al., 2018), due to
its simplicity, efficiency and adaptability to nonlinear and
high-dimensional simulation models (Evensen, 2003; Kalnay et al., 2007).
EnKF is an iterative procedure that keeps alternating between model
forecasting and state updating. Each forecasting step produces an
ensemble of different predictions that accounts for uncertainty about
model inputs, parameter values and model structure. Each updating step
uses observations, weighted by measurement uncertainty, to correct the
ensemble forecast. Sampling error can be minimized by using a large
ensemble size (Whitaker and Hamill, 2012). However, inappropriately
estimated system errors may lead to filter divergence , in which
subsequent ensemble forecasts drift further from the truth (Anderson and
Anderson, 1999; Jazwinski, 1970) and distributions of forecasted states
become too narrow. Forecasting uncertainty is thus underestimated
relative to observational uncertainty, making the observations
essentially irrelevant. To alleviate filter divergence, additive or
multiplicative inflation factors are commonly used (Huang et al., 2019).
For instance, while assimilating remotely sensed soil moisture and LAI
within the crop model DSSAT, a variant of EnKF, Ensemble Square Root
Filter, was applied, in which fixed small inflation factors (1.05 for
soil moisture and 1.50 for LAI) were included to prevent filter
divergence (Ines et al., 2013). Instead of using fixed values, Kivi et
al. (2022) adaptively estimated dynamic inflation factors to assimilatein-situ observed daily soil moisture for updating soil water and
nitrogen (N) dynamics in the crop model APSIM. However, as inflation
factors are not physically constrained, their application to complicated
dynamic models with many different outputs is not straightforward (Ying
and Zhang, 2015). Quantifying parameterization errors in crop models and
uncertainties of remotely sensed observations is indispensable when
applying EnKF to achieve more accurate forecasts of crop growth status
(Jin et al., 2018).
Parameter accuracy of a crop model significantly affects the performance
of DA and yield forecasting (Kang and Özdoğan, 2019). To improve model
parameter accuracy, various parameter inference methods have been
developed and Bayesian approaches are becoming increasingly popular
(e.g., Beven and Freer (2001); Vrugt et al. (2009b)). The interest in
applying Bayesian approaches lies not only in inferring the most likely
parameter values, but also in estimating their underlying posterior
probability distribution functions (pdf) and even in estimating model
structural error (Huang et al., 2019). Markov Chain Monte Carlo (MCMC)
methods are typically used in these Bayesian approaches to link crop
model simulations with observations. Based on data probability
quantified by a likelihood function and, commonly, the
Metropolis-Hasting search strategy (Hastings, 1970; Metropolis et al.,
1953), the prior probability distribution for the parameters of the crop
model and residual error model is updated to a posterior distribution
conditioned by the information in the data. Normally, residual errors
are assumed independent and identically distributed (i.i.d.), following
a normal distribution with zero mean and constant variance (Box and
Tiao, 1973). However, in-field observations always have variable
residuals throughout the growing season (Dumont et al., 2014). Thus, a
likelihood function revised by observational variance was proposed for
accounting for the heteroscedasticity in the crop model STICS (Dumont et
al., 2014). A more generalized formal likelihood function based on a
general error model was developed by Schoups and Vrugt (2010) for a
hydrological model, which allows for the heteroscedasticity and
non-Gaussian model residual errors. Their approach allows for diagnostic
checking of residual error assumptions and does not require the i.i.d.
assumption. As EnKF has been shown to be effective in cases with
nonlinear or non-Gaussian errors (Han and Li, 2008; Katzfuss et al.,
2016), investigating the applicability of integrating the calibrated
uncertain parameters in a crop model with generalized error into the
EnKF framework is in demand.
Errors in remote sensing data hamper the use of these data for
predictions from nonparametric regression modelling, one of the most
frequently used approaches to predict crop status from remote sensing
data (Huang et al., 2019; Verrelst et al., 2019). Among the
nonparametric models, the Gaussian Process Regression (GPR) model,
developed within a Bayesian framework (Rasmussen and Williams, 2006),
has been considered as a promising method, not only because of better
prediction performance (Verrelst et al., 2012), but also because it
quantifies predictive uncertainty (Berger et al., 2020a; Wang et al.,
2019; Verrelst et al., 2019). Temporal and spatial transferability of
GPR has been demonstrated by successfully transporting the GPR model to
other images (Verrelst et al., 2013b). However, there is a need for
comparison of DA from remote sensing (DArs) with DA from
field measurements (DAfm) (Huang et al., 2019). Due to
destructive sampling, the sampling sites would not remain the same in
ground observations but that inconsistency is normally neglected while
conducting DAfm. In contrast, in remote sensing
predictions, although prediction errors always exist, temporal changes
in crop growth can be predicted reasonably well. Thus, with those
predominant merits of GPR, its performance upon further incorporating
into the EnKF framework should be evaluated and compared with that of
DAfm.
Studies have been conducted to connect process-based simulations, field
observations, and their uncertainties in order to have reliable
forecasts using DA. In a hydrological modelling study, Vrugt et al.
(2005) simultaneously estimated parameter uncertainties and structural
errors as well as observational errors. In this approach, an inner EnKF
loop for recursive state simulation and an outer global optimization
loop for posterior estimation were incorporated in simultaneous
parameter estimation and data assimilation. However, even though model
predictive ability is supposed to be enhanced by improved
parameterization, assessing model structural and input errors may be
hindered when parameter values are not fixed (Schoups and Vrugt, 2010).
In an observing system simulation experiment that assimilated LAI and
soil moisture data into the crop model SWAP, Hu et al. (2017) found that
simultaneously updating parameters tended to worsen the performance of
grain yield prediction when the uncertain parameters that directly
determine biomass and grain formation were incorporated. A method for
systematically quantifying uncertainties in the crop model simulation
and remotely sensed observations from a separate Bayesian process and
applying them into an EnKF framework is strongly needed for better
forecasting of crop growth status. Such a method can help for the
careful approximation and application of uncertainties in other DA
algorithms or frameworks (Huang et al., 2019), and can be a potential
reference of the desired model-data fusion framework for better Earth
system forecasting (Gettelman et al., 2022).
The objective of this study was to develop a Bayesian methodology that
combines disparate quantitative methods into one framework, i.e.
incorporating the systematically analyzed errors in crop model
simulations and remote sensing observations into the data assimilation
procedure of EnKF. We expect that this framework enhances the
forecasting of the crop growth status. The methodology was validated in
an actual case of a field experiment of rice. The crop model GECROS was
selected for generating crop growth simulations, due to its generality
and physiological robustness (Yin and Struik, 2017; Yin and van Laar,
2005) (see a brief description of the crop model GECROS in Supplement
A). Our specific objectives were: 1) to calibrate and validate GECROS
under field conditions in China given the heteroscedastic and
non-Gaussian residual error assumption; 2) to evaluate the performance
of the GPR model for remote sensing prediction and its uncertainty
estimation; 3) to assess the applicability of estimated uncertainties of
the crop model simulations and the remote sensing observations in EnKF.