Evan Krell

and 5 more

Atmospheric AI modeling is increasingly reliant on complex machine learning (ML) techniques and high-dimensional gridded inputs to develop models that achieve high predictive skill. Complex deep learning architectures such as convolutional neural networks and transformers are trained to model highly non-linear atmospheric phenomena such as coastal fog [1], tornadoes [2], and severe hail [3]. The input data is typically in the form of gridded spatial data composed of multiple channels of satellite imagery, numerical weather prediction output, reanalysis products, etc. In many studies, the use of complex architectures and high-dimensional inputs were shown to substantially outperform simpler alternatives. A major challenge when using complex ML techniques is that it is very difficult to understand how the trained model works. The complexity of the model obfuscates the relationship between the input and prediction. It is often of interest to understand a model’s decision-making process. By exposing the model’s behavior, users could verify that the model has learned physically realistic predictive patterns. This information can be used to calibrate trust in the model. The model may have also learned novel patterns within the data that could be used to gain new insights into the atmospheric process. Extracting learned patterns could be used to generate hypotheses for scientific discovery. The rapid adoption of complex ML models and the need to understand how they work has led to the development of a broad class of techniques called eXplainable Artificial Intelligence (XAI). These methods probe the models in various ways to reveal insights into how they work. Correlations among input features can make it challenging to produce meaningful explanations. The gridded spatial data common in atmospheric modeling applications typically have extensive correlation. Spatial autocorrelation is present among the cells of each spatial grid, but autocorrelation may exist across the gridded data volume due to spatial or temporal relationships between adjacent channels. In addition, there may be correlations between distant locations due to teleconnections between them. Correlated input features may cause high variance among the trained models. If grid cells are highly correlated, then the target function that the network is attempting to learn is ill-defined and an infinite number of models can be generated that achieve approximately equal performance. Even assuming a perfect XAI method exists, the attribution reflects only the patterns learned for a given model. It is arbitrary which of the correlated features are used by a given model. This can lead to a misleading understanding of the actual relationship between the input features and target. A potential solution is to group the correlated features before applying XAI. Attribution can be assigned to each group rather than to individual cells. In this case, all the correlated cells will be permuted at the same time to analyze their collective impact on the output. The purpose is to reveal the contribution of each group of related cells toward the model output. Ideally, the explanations are insensitive to the random choice among correlated features learned by the model. Without grouping, the user can be misled to consider a feature as not being related to the target because of the presence of correlated features. With grouping, the explanations should better reveal the learned patterns. Grouping features based on correlation can be challenging. The correlation rarely equals one and the strength of the correlation influences the variance among trained models. Calculating the correlation can be difficult because of partial correlations and fuzzy, continuous boundaries. The choice of groups can greatly influence the explanations. Another challenge is that it is not straight-forward to assess the quantitative accuracy of an XAI technique. This is because there is rarely a ground truth explanation to compare to. If we knew the attribution, we would not need XAI methods. Synthetic benchmarks for analyzing XAI have been proposed as a solution [4]. It is possible to define a non-linear function such that the contribution of each grid cell’s value to the function output can be derived. This attribution map represents the ground truth for comparison the the output of XAI methods that are applied to a model that very closely approximates the hand-crafted function. In this research, we develop a set of benchmarks to investigate the influence of correlated features on the variation in XAI outputs for a set of trained models. We then explore how features can be grouped to reduce the explanation variance so that users have improved insight into the learned patterns.  First, we create a set of very simple mathematical demonstrations that precisely demonstrate the influence of correlated features and how grouping features provides a solution. Using insights from these experiments, we develop a tool for detecting when correlated features are likely to cause misleading explanations. We then create a set of more realistic benchmarks that are based on atmospheric modeling problems such as sea surface temperature and coastal fog prediction. By defining benchmarks with known ground truth explanations, we can analyze various techniques for grouping the grid cells based on their correlations. Based on our findings, we offer recommendations for strategies to group correlated data so that users can better leverage XAI results toward model development and scientific insights. [1] Kamangir, H., Collins, W., Tissot, P., King, S. A., Dinh, H. T. H., Durham, N., & Rizzo, J. (2021). FogNet: A multiscale 3D CNN with double-branch dense block and attention mechanism for fog prediction. Machine Learning with Applications, 5, 100038.[2] Lagerquist, R. (2020). Using Deep Learning to Improve Prediction and Understanding of High-impact Weather.[3] Gagne II, D. J., Haupt, S. E., Nychka, D. W., & Thompson, G. (2019). Interpretable deep learning for spatial analysis of severe hailstorms. Monthly Weather Review, 147(8), 2827-2845.[4] Mamalakis, A., Ebert-Uphoff, I., & Barnes, E. A. (2022). Neural network attribution methods for problems in geoscience: A novel synthetic benchmark dataset. Environmental Data Science, 1, e8.
Cloud optical property retrievals from passive satellite imagers tend to be most accurate during the daytime due to the availability of visible and near-infrared solar reflectances. Infrared (IR) channels have a relative lack of spectral sensitivity to optically thick clouds and are heavily influenced by cloud-top temperature making accurate retrievals of cloud optical depth, cloud effective radius, and cloud water path more difficult at night. In this work, we examine whether the use of spatial context—information about the local structure and organization of cloud features—can help overcome these limitations of IR channels and provide more accurate estimates of nighttime cloud optical properties. We trained several neural networks to emulate the Advanced Baseline Imager (ABI) NOAA Daytime Cloud Optical and Microphysical Properties (DCOMP) algorithm using only IR channels. We then compared the neural networks to the NOAA operational daytime and nighttime products, and the Nighttime Lunar Cloud Optical and Microphysical Properties (NLCOMP) algorithm, which utilizes the low-light visible band on VIIRS in collocated imagery. These comparisons show that the use of spatial context can significantly improve estimates of nighttime cloud optical properties. The primary model we trained, U-NetCOMP, can reasonably match DCOMP during the day and significantly reduces artifacts associated with day/night terminator. We also find that U-NetCOMP estimates align more closely with NLCOMP at night compared to the nighttime NOAA operational products for ABI.

Katherine Haynes

and 4 more

Neural networks (NN) have become an important tool for prediction tasks -- both regression and classification -- in environmental science.  Since many environmental-science problems involve life-or-death decisions and policy-making, it is crucial to provide not only predictions but also an estimate of the uncertainty in the predictions.  Until recently, very few tools were available to provide uncertainty quantification (UQ) for NN predictions.  However, in recent years the computer-science field has developed numerous UQ approaches, and several research groups are exploring how to apply these approaches in environmental science.  We provide an accessible introduction to six of these UQ approaches, then focus on tools for the next step, namely to answer the question: Once we obtain an uncertainty estimate (using any approach), how do we know whether it is good or bad?  To answer this question, we highlight four evaluation graphics and eight evaluation scores that are well suited for evaluating and comparing uncertainty estimates (NN-based or otherwise) for environmental-science applications.  We demonstrate the UQ approaches and UQ-evaluation methods for two real-world problems: (1) estimating vertical profiles of atmospheric dewpoint (a regression task) and (2) predicting convection over Taiwan based on Himawari-8 satellite imagery (a classification task).  We also provide Jupyter notebooks with Python code for implementing the UQ approaches and UQ-evaluation methods discussed herein.  This article provides the environmental-science community with the knowledge and tools to start incorporating the large number of emerging UQ methods into their research.

Jamin Kurtis Rader

and 3 more

Assessing forced climate change requires the extraction of the forced signal from the background of climate noise. Traditionally, tools for extracting forced climate change signals have focused on one atmospheric variable at a time, however, using multiple variables can reduce noise and allow for easier detection of the forced response. Following previous work, we train artificial neural networks to predict the year of single- and multi-variable maps from forced climate model simulations. To perform this task, the neural networks learn patterns that allow them to discriminate between maps from different years—that is, the neural networks learn the patterns of the forced signal amidst the shroud of internal variability and climate model disagreement. When presented with combined input fields (multiple seasons, variables, or both), the neural networks are able to detect the signal of forced change earlier than when given single fields alone by utilizing complex, nonlinear relationships between multiple variables and seasons. We use layer-wise relevance propagation, a neural network explainability tool, to identify the multivariate patterns learned by the neural networks that serve as reliable indicators of the forced response. These “indicator patterns” vary in time and between climate models, providing a template for investigating inter-model differences in the time evolution of the forced response. This work demonstrates how neural networks and their explainability tools can be harnessed to identify patterns of the forced signal within combined fields.