loading page

D-score: deconstructing mean squared error and model performance
  • Timothy Hodson,
  • Thomas Over,
  • Sydney Foks
Timothy Hodson
U.S. Geological Survey

Corresponding Author:[email protected]

Author Profile
Thomas Over
U.S. Geological Survey
Author Profile
Sydney Foks
U.S. Geological Survey
Author Profile

Abstract

As science becomes increasingly cross-disciplinary and scientific models become increasingly cross-coupled, standardized practices of model evaluation are more important than ever. For normally distributed data, mean-squared error (MSE) is ideal as an objective and general-purpose measure of model performance, but MSE gives little insight into what aspects of model performance are ‘good’ or ‘bad’. This apparent weakness has led to a myriad of specialized error metrics, which are often aggregated to form a composite score. Such scores are inherently subjective, however, and while their components are interpretable, the composite itself is not. We contend that a better approach to model benchmarking and interpretation is to decompose the MSE into more interpretable components. To demonstrate the versatility of this approach, we outline some fundamental types of decomposition and apply them to predictions at 1,021 streamgages across the conterminous United States from three streamflow models. Through this demonstration, we show that each component in a decomposition represents a distinct concept and that simple decompositions can be combined to represent more complex concepts forming an expressive language through which to interrogate models and data.