The use of machine learning (ML) for the online correction of coarse-resolution atmospheric models has proven effective in reducing biases in near-surface temperature and precipitation rate. However, this often introduces biases in the upper atmosphere and improvements are not always reliable across ML-corrective models trained with different random seeds. Furthermore, ML corrections can feed back on the baseline physics of the atmospheric model and produce profiles that are outside the distribution of samples used in training, leading to low confidence in the predicted corrections. This study introduces the use of a novelty detector to mask the predicted corrections when the atmospheric state is deemed out-of-sample. The novelty detector is trained on profiles of temperature and specific humidity in a semi-supervised fashion using samples from the coarsened reference fine-resolution simulation. Offline, the novelty detector determines more columns to be out-of-sample in simulations which are known, using simple metrics like mean bias, to drift further from the reference simulation. Without novelty detection, corrective ML leads to the development of undesirably large climate biases for some ML random seeds but not others. Novelty detection deems about 21% of columns to be novelties in year-long simulations. The spread in the root mean square error (RMSE) of time-mean spatial patterns of surface temperature and precipitation rate across a random seed ensemble is sharply reduced when using novelty detection. In particular, the random seed with the worst RMSE is improved by up to 60% (depending on the variable) while the best seed maintains its low RMSE.

Ilai Guendelman

and 9 more

Recent advances have allowed for integration of global storm resolving models (GSRMs) to a timescale of several years. These short simulations are sufficient for studying characteristics and statistics of short- and small-scale phenomena; however, it is questionable what we can learn from these integrations about the large-scale climate response to perturbations. To address this question, we use the response of X-SHiELD (a GSRM) to uniform SST warming and CO$_2$ increase in a two-year integration and compare it to similar CMIP6 experiments. Specifically, we assess the statistical meaning of having two years in one model outside the spread of another model or model ensemble. This is of particular interest because X-SHiELD shows a distinct response of the global mean precipitation to uniform warming, and the northern hemisphere jet shift response to isolated CO$_2$ increase. We use the CMIP6 models to estimate the probability of two years in one model being more than one standard deviation away from another model (ensemble) mean, knowing the mean of two models. For example, if two years in one model are more than one standard deviation away from the other model’s mean, we find that the chances for these models’ means to be within one standard deviation are $\sim 25\%$. We find that for some large-scale metrics, there is an important base-state dependence that, when taken into account, can qualitatively change the interpretation of the results. We note that a year-to-year comparison is physically meaningful due to the use of prescribed sea-surface-temperature simulations.
Global atmospheric ‘storm-resolving’ models with horizontal grid spacing of less than 5~km resolve deep cumulus convection and flow in complex terrain. They promise to be reference models that could be used to improve computationally affordable coarse-grid global climate models across a range of climates, reducing uncertainties in regional precipitation and temperature trends. Here, machine learning of nudging tendencies as functions of column state is used to correct the physical parameterization tendencies of temperature, humidity, and optionally winds, in a real-geography coarse-grid model (FV3GFS with a 200~km grid) to be closer to those of a 40-day reference simulation using X-SHiELD, a modified version of FV3GFS with a 3~km grid. Both simulations specify the same historical sea-surface temperature fields. This methodology builds on a prior study using a global observational analysis as the reference. The coarse-grid model without machine learning corrections has too little cloud, causing too much daytime heating of land surfaces that creates excessive surface latent heat flux and rainfall. This bias is avoided by learning downwelling radiative flux from the fine-grid model. The best configuration uses learned nudging tendencies for temperature and humidity but not winds. Neural nets slightly outperform random forests. Forecasts of 850 hPa temperature gain 18 hours of skill at 3–7 day leads and time-mean precipitation patterns are improved 30\% by applying the ML correction. Adding machine-learned wind tendencies improves 500 hPa height skill for the first five days of forecasts but degrades time-mean upper tropospheric temperature and zonal wind patterns thereafter.