\(\)\(\)This paper advocates the use of simulation distributions for hydrologic model evaluation and model diagnostics. Distribution evaluation is supported by information-theoretic arguments and puts into modeling practice the social justice narrative of diversity, equity and inclusion for different simulations. We discuss past developments that led to the current state-of-the-art of forecast verification in hydrology and bring to the fore scoring rules for model evaluation and diagnostics. Strictly proper scoring rules condense a distribution forecast to a single reward value for the materialized outcome(s) and have a strong underpinning in statistical, decision and information theory. We review scoring rules for dichotomous and categorical events, quantiles (intervals) and density forecasts, discuss the importance of scoring rule propriety and address diagnostic aspects such as sharpness, reliability and entropy. The usefulness and power of scoring rules is demonstrated on simple benchmark problems and discharge distributions simulated with conceptual watershed models using GLUE and Bayesian model averaging. We also link scoring rules to model diagnostics and present strictly proper divergence scores for flood frequency analysis and flow duration and recession curves. Scoring rules offer a rigorous information-theoretic underpinning to model evaluation and diagnostics and provide statistically principled means for (Bayesian) model selection and the analysis of hydrograph functionals, flood frequencies and extreme events.

Jasper A. Vrugt

and 3 more

In this paper we review basic elements of Frequentist inference, specifically maximum likelihood (ML) and M-estimation to point out a critical flaw of Bayesian methods for hydrologic model training and uncertainty quantification. Under model misspecification, the sensitivity $\widehat{\mathbf{A}}_{n}$ and variability $\widehat{\mathbf{B}}_{n}$ matrices of the ML model parameter values $\widehat{\bm{\uptheta}}_{n}$ provide conflicting information about the observed Fisher information $\widehat{\boldsymbol{\mathcal{I}}}\vphantom{\overline{\widehat{\boldsymbol{\mathcal{I}}}}}_{n}$ of the data $\omega_{1},\ldots,\omega_{n}$ for $\bm{\uptheta} = (\theta_{1},\ldots,\theta_{d})^{\top}$. As a result, the ML parameter covariance matrix, $\Var(\widehat{\bm{\uptheta}}_{n})$, does not simplify to the matrix inverse of the observed Fisher information, $\widehat{\boldsymbol{\mathcal{I}}}\vphantom{\overline{\widehat{\boldsymbol{\mathcal{I}}}}}_{n}$, as suggested by naive ML estimators and Bayesian MCMC methods but amounts instead to the so-called sandwich matrix $\Var(\widehat{\bm{\uptheta}}_{n}) = \widehat{\boldsymbol{\mathcal{G}}}\vphantom{\overline{\widehat{\boldsymbol{\mathcal{G}}}}}_{n}^{-1} = \fracn \widehat{\mathbf{A}}_{n}^{-1}\widehat{\mathbf{B}}^{\vphantom{-1}}_{n}\widehat{\mathbf{A}}_{n}^{-1}$, where the observed Godambe information $\widehat{\boldsymbol{\mathcal{G}}}\vphantom{\overline{\widehat{\boldsymbol{\mathcal{G}}}}}_{n}$ is the fundamental currency of data informativeness under model misspecification. The \textit{sandwich} matrix is a metaphor for a \textit{meat} matrix $\widehat{\mathbf{B}}_{n}$ between two \textit{bread} matrices $\widehat{\mathbf{A}}_{n}$ and yields asymptotically valid “robust standard errors” even when the likelihood function $L_{n}(\bm{\uptheta})$ (model) is incorrectly specified. The implications of the sandwich variance estimator are demonstrated in three case studies involving the modeling of soil water infiltration, watershed hydrologic fluxes and the rainfall-discharge transformation. First and foremost, our analytic and numerical results demonstrate that the sandwich variance estimator increases substantially hydrologic model parameter and predictive uncertainty. The sandwich estimator is invariant to likelihood stretching practiced by the GLUE method as a remedy for over-conditioning and requires magnitude and/or curvature adjustments to the likelihood function to yield asymptotically valid sandwich parameter estimates and inference via MCMC simulation.