1. Introduction
Accurate forecasting and mapping of spatiotemporal variabilities in aboveground biomass (W above) and grain yield during the growing season are essential for informing farmers to perform field precision management under variable climatic conditions (Acevedo et al., 2020; Gao et al., 2017; Lobell and Azzari, 2017). Remote sensing science serves these purposes via connecting field measurements with sensor observations. For instance, crop yield forecasting based on regression models using field-measured yield and remote sensing features dates back to the 1970s (Idso et al., 1977). However, the poor exploration of model and data uncertainties has increasingly become a limitation for most remote sensing observations at high spatial resolution (Martínez-Ferrer et al., 2022). Although information like leaf area index (LAI, m2 m-2) and some weather variables can be incorporated into those regression methods to improve predictions (Johnson, 2014), the interactions within the continuum of atmosphere-crop-soil are widely overlooked. In this regard, it may be useful to explore dynamic crop models that have been developed since the 1960s to simulate crop growth and yield (e.g., de Wit (1965); de Wit and Penning de Vries (1985); Jones et al. (2003); Keating et al. (2003); Yin and van Laar (2005)) given that these models are based on in-depth understanding of crop physiological principles. However, as such crop models are generally developed and tested at the scale of a homogeneous plot, uncertainties are inevitable when applying them to heterogeneous farmers’ fields. Uncertainties are also caused by incomplete knowledge of physiological processes, parameter values, meteorological conditions, soil properties and management practices (Hansen and Jones, 2000). For predictions using either remote sensing or crop modelling, all these uncertainties propagate, leading to bias in simulated in-season crop growth and end-of-season crop yield.
Combined utilization of crop model simulations and remotely sensed observations is expected to produce a more accurate estimate than any of the two approaches on its own and attracts ever-increasing interests in smart farming (Houser et al., 2012; Jin et al., 2018). To this end, data assimilation (DA) methods have been developed (Jin et al., 2018). Monte Carlo-based Ensemble Kalman Filter (EnKF) (Evensen, 1994) is among the most popular methods for conducting DA (Carrassi et al., 2018), due to its simplicity, efficiency and adaptability to nonlinear and high-dimensional simulation models (Evensen, 2003; Kalnay et al., 2007). EnKF is an iterative procedure that keeps alternating between model forecasting and state updating. Each forecasting step produces an ensemble of different predictions that accounts for uncertainty about model inputs, parameter values and model structure. Each updating step uses observations, weighted by measurement uncertainty, to correct the ensemble forecast. Sampling error can be minimized by using a large ensemble size (Whitaker and Hamill, 2012). However, inappropriately estimated system errors may lead to filter divergence , in which subsequent ensemble forecasts drift further from the truth (Anderson and Anderson, 1999; Jazwinski, 1970) and distributions of forecasted states become too narrow. Forecasting uncertainty is thus underestimated relative to observational uncertainty, making the observations essentially irrelevant. To alleviate filter divergence, additive or multiplicative inflation factors are commonly used (Huang et al., 2019). For instance, while assimilating remotely sensed soil moisture and LAI within the crop model DSSAT, a variant of EnKF, Ensemble Square Root Filter, was applied, in which fixed small inflation factors (1.05 for soil moisture and 1.50 for LAI) were included to prevent filter divergence (Ines et al., 2013). Instead of using fixed values, Kivi et al. (2022) adaptively estimated dynamic inflation factors to assimilatein-situ observed daily soil moisture for updating soil water and nitrogen (N) dynamics in the crop model APSIM. However, as inflation factors are not physically constrained, their application to complicated dynamic models with many different outputs is not straightforward (Ying and Zhang, 2015). Quantifying parameterization errors in crop models and uncertainties of remotely sensed observations is indispensable when applying EnKF to achieve more accurate forecasts of crop growth status (Jin et al., 2018).
Parameter accuracy of a crop model significantly affects the performance of DA and yield forecasting (Kang and Özdoğan, 2019). To improve model parameter accuracy, various parameter inference methods have been developed and Bayesian approaches are becoming increasingly popular (e.g., Beven and Freer (2001); Vrugt et al. (2009b)). The interest in applying Bayesian approaches lies not only in inferring the most likely parameter values, but also in estimating their underlying posterior probability distribution functions (pdf) and even in estimating model structural error (Huang et al., 2019). Markov Chain Monte Carlo (MCMC) methods are typically used in these Bayesian approaches to link crop model simulations with observations. Based on data probability quantified by a likelihood function and, commonly, the Metropolis-Hasting search strategy (Hastings, 1970; Metropolis et al., 1953), the prior probability distribution for the parameters of the crop model and residual error model is updated to a posterior distribution conditioned by the information in the data. Normally, residual errors are assumed independent and identically distributed (i.i.d.), following a normal distribution with zero mean and constant variance (Box and Tiao, 1973). However, in-field observations always have variable residuals throughout the growing season (Dumont et al., 2014). Thus, a likelihood function revised by observational variance was proposed for accounting for the heteroscedasticity in the crop model STICS (Dumont et al., 2014). A more generalized formal likelihood function based on a general error model was developed by Schoups and Vrugt (2010) for a hydrological model, which allows for the heteroscedasticity and non-Gaussian model residual errors. Their approach allows for diagnostic checking of residual error assumptions and does not require the i.i.d. assumption. As EnKF has been shown to be effective in cases with nonlinear or non-Gaussian errors (Han and Li, 2008; Katzfuss et al., 2016), investigating the applicability of integrating the calibrated uncertain parameters in a crop model with generalized error into the EnKF framework is in demand.
Errors in remote sensing data hamper the use of these data for predictions from nonparametric regression modelling, one of the most frequently used approaches to predict crop status from remote sensing data (Huang et al., 2019; Verrelst et al., 2019). Among the nonparametric models, the Gaussian Process Regression (GPR) model, developed within a Bayesian framework (Rasmussen and Williams, 2006), has been considered as a promising method, not only because of better prediction performance (Verrelst et al., 2012), but also because it quantifies predictive uncertainty (Berger et al., 2020a; Wang et al., 2019; Verrelst et al., 2019). Temporal and spatial transferability of GPR has been demonstrated by successfully transporting the GPR model to other images (Verrelst et al., 2013b). However, there is a need for comparison of DA from remote sensing (DArs) with DA from field measurements (DAfm) (Huang et al., 2019). Due to destructive sampling, the sampling sites would not remain the same in ground observations but that inconsistency is normally neglected while conducting DAfm. In contrast, in remote sensing predictions, although prediction errors always exist, temporal changes in crop growth can be predicted reasonably well. Thus, with those predominant merits of GPR, its performance upon further incorporating into the EnKF framework should be evaluated and compared with that of DAfm.
Studies have been conducted to connect process-based simulations, field observations, and their uncertainties in order to have reliable forecasts using DA. In a hydrological modelling study, Vrugt et al. (2005) simultaneously estimated parameter uncertainties and structural errors as well as observational errors. In this approach, an inner EnKF loop for recursive state simulation and an outer global optimization loop for posterior estimation were incorporated in simultaneous parameter estimation and data assimilation. However, even though model predictive ability is supposed to be enhanced by improved parameterization, assessing model structural and input errors may be hindered when parameter values are not fixed (Schoups and Vrugt, 2010). In an observing system simulation experiment that assimilated LAI and soil moisture data into the crop model SWAP, Hu et al. (2017) found that simultaneously updating parameters tended to worsen the performance of grain yield prediction when the uncertain parameters that directly determine biomass and grain formation were incorporated. A method for systematically quantifying uncertainties in the crop model simulation and remotely sensed observations from a separate Bayesian process and applying them into an EnKF framework is strongly needed for better forecasting of crop growth status. Such a method can help for the careful approximation and application of uncertainties in other DA algorithms or frameworks (Huang et al., 2019), and can be a potential reference of the desired model-data fusion framework for better Earth system forecasting (Gettelman et al., 2022).
The objective of this study was to develop a Bayesian methodology that combines disparate quantitative methods into one framework, i.e. incorporating the systematically analyzed errors in crop model simulations and remote sensing observations into the data assimilation procedure of EnKF. We expect that this framework enhances the forecasting of the crop growth status. The methodology was validated in an actual case of a field experiment of rice. The crop model GECROS was selected for generating crop growth simulations, due to its generality and physiological robustness (Yin and Struik, 2017; Yin and van Laar, 2005) (see a brief description of the crop model GECROS in Supplement A). Our specific objectives were: 1) to calibrate and validate GECROS under field conditions in China given the heteroscedastic and non-Gaussian residual error assumption; 2) to evaluate the performance of the GPR model for remote sensing prediction and its uncertainty estimation; 3) to assess the applicability of estimated uncertainties of the crop model simulations and the remote sensing observations in EnKF.