Lasse Torben Keetz

and 7 more

Complex Land Surface Models (LSMs) rely on a plethora of parameters. These parameters and the associated process formulations are often poorly constrained, which hampers reliable predictions of ecosystem dynamics and climate feedbacks. Robust and uncertainty-aware parameter estimation with observations is complicated by, for example, the high dimensionality of the model parameter space and the computational cost of LSM simulations. Herein, we adapt a novel Bayesian data assimilation and machine learning framework termed ‘calibrate, emulate, sample‘ (CES) to infer parameters in a widely-used LSM coupled with a demographic vegetation model (CLM-FATES). First, an iterative ensemble Kalman smoother provides an initial estimate of the posterior distribution (‘calibrate‘). Subsequently, a machine-learning-based emulator is trained on the resulting model-observation mismatches to predict outcomes for unseen parameter combinations (‘emulate‘). Finally, this emulator replaces CLM-FATES simulations in an adaptive Markov Chain Monte Carlo approach enabling computationally feasible posterior sampling with enhanced uncertainty quantification (‘sample‘). We test our implementation with synthetic and real observations representing a boreal forest site in southern Finland. We estimate a total of six plant-functional-type-specific photosynthetic parameters by assimilating evapotranspiration (ET) and gross primary production (GPP) flux data. CES provided the best estimates of the synthetic truth parameters when compared to data-blind emulator sampling designs while all approaches reduced model-observation errors compared to a default parameter simulation (GPP: -10% to -30%, ET: -4% to -6%). Although errors were also consistently reduced with real data, comparing the emulator designs was less conclusive, which we mainly attribute to equifinality and insufficient experiment complexity.