Stochastic hydrology produces ensembles of time series that represent plausible future streamflow to simulate and test the operation of water resource systems. A premise of stochastic hydrology is that ensembles should be statistically representative of what may occur in the future. In the past, the application of this premise has involved producing ensembles that are statistically equivalent to the observed or historical streamflow sequence. This requires a number of metrics or statistics that can be used to test statistical similarity. However, with climate change, the past may no longer be representative of the future. Ensembles to test future systems operations should recognize non-stationarity, and include time series representing expected changes. This poses challenges for their testing and validation. In this paper, we suggest an evidence-based analysis in which streamflow ensembles, whether statistically similar to and representative of the past or a changing future, should be characterized and assessed using an extensive set of statistical metrics. We have assembled a broad set of metrics and applied them to annual streamflow in the Colorado River at Lees Ferry to illustrate the approach. We have also developed a tree-based classification approach to categorize both ensembles and metrics. This approach provides a way to visualize and interpret differences between streamflow ensembles. The metrics presented and their classification provide an analytical framework for characterizing and assessing the suitability of future streamflow ensembles, recognizing the presence of non-stationarity. This contributes to better planning in large river basins, such as the Colorado, facing water supply shortages.