1. Introduction

Hydrometeorological data are essential for decision making in water resources engineering and management, and for developing predictive models for how watersheds and ecosystems respond to perturbation. Extreme-event analysis, flood mapping, and hydrological model building, calibration, and validation all rely on hydrometeorological data (Borga et al., 2011; Clark et al., 2008; Khan et al., 2011; Marchi et al., 2010; Razavi & Coulibaly, 2013). Although different models require various data inputs, most models could benefit from intensively measured hydrometeorological data spanning diverse catchments. Notably, the continued development of both data-driven models and physically based distributed models requires comprehensive data for their execution and validation (Andersen et al., 2001; Asong et al., 2020; Kumar et al., 2008; Nord et al., 2017). Cross-site synthesis can also provide core knowledge to scale up hillslope to global processes and thus improve Earth system models (Fan et al., 2019).
Besides benefiting the development of hydrological and ecosystem models, comprehensive catchment data sets could also improve site-specific and comparative cross-site studies. Place-based studies, such as flood prediction (Rozalis et al., 2010), dominant hydrological process analysis (Schmocker-Fackel et al., 2007; Western et al., 2004), and climate change impact investigations (Jha et al., 2004) are critical in local decision making and hypothesis testing. For example, Tennant et al. (2020) made use of multiple hydrometeorological variables to improve the understanding of the dominant controls on catchment discharge. Conversely, comparative hydrology aims to understand hydrological variability and the role of catchment characteristics, and to develop generally applicable models (Kuentz et al., 2017; Sawicz et al., 2011; Wymore et al., 2017). For example, Wymore et al. (2017) studied concentration-discharge relationships across 10 tropical watersheds with different landscape characteristics. With the increasing interest in comparative hydrology, demand for large-sample hydrological datasets has grown (Gupta et al., 2014). Such large-sample hydrology datasets support continental-scale hydrological studies, facilitate comparative hydrological analysis, and help to identify hydrological patterns (Addor et al., 2017; Duan et al., 2006). The comprehensive dataset presented in this study, a synthesis of streamflow and hydrometeorology data across intensively monitored catchments, will serve the hydrological research community by providing quality-controlled, ready-to-use data with a coordinated and standardized structure.
CHOSEN (Comprehensive Hydrologic Observatory SEnsor Network) is a compilation of data from the Long-Term Ecological Research (LTER) and Critical Zone Observatory (CZO) networks, and several other ecological and hydrological observatories. Initiatives like the LTER and CZO networks seek to create opportunities for analyses that span multiple watersheds and ecosystem types. However, cross-network and cross-site comparative efforts are often hampered by site-to-site differences in which variables are measured, how they are processed and formatted, and how they are reported. The work of finding diverse catchment data sets, extracting them from whatever formats they are stored in, and cleaning and harmonizing them requires a significant investment of time and effort. CHOSEN aims to address these challenges by providing a ready-to-use comprehensive hydrometeorological dataset, with an accompanying open-access data processing pipeline allowing for the incorporation of new data and the continued evolution of the data set.
Several previous data synthesis efforts, including the MOPEX (Duan et al., 2006) and CAMELS datasets (Addor et al., 2017), have also sought to facilitate large-sample hydrological studies. Compared with those previous datasets, the CHOSEN dataset focuses strictly on intensively monitored sites with field measurements that extend beyond just discharge, precipitation, and weather, to include snow depth and snow water equivalent (SWE), soil moisture, soil temperature, and isotope data. Time series of these variables are critical to process-based hydrological and ecological studies, for example, process-oriented benchmarking evaluation (Nearing et al., 2018), and coupling physical process models with machine learning (Reichstein et al., 2019). Such datasets can also assist in understanding the physically based mechanisms underlying watershed behavior (Werkhoven et al., 2008) and ecosystem resilience (Qi et al., 2016). In some catchments, soil moisture patterns have been used to reveal the dynamics of water storage and transport in the landscape (Bracken et al., 2013; James & Roulet, 2007; Tetzlaff et al., 2011). Snow data are essential in investigating hydrological processes and simulating runoff in snow-dominated areas (Rasmussen et al., 2011; Foy et al., 2015). Isotope time series facilitate the tracing of water fluxes through watersheds (Hrachowitz et al., 2013). Rather than merely treating basins as black boxes that convert precipitation inputs to streamflow outputs, the age distribution of the water derived from isotope data provides information about storage timescales within catchments (Kirchner et al., 2000; McDonnell et al., 2010; Soulsby et al., 2006; Tetzlaff et al., 2014). By focusing on intensively monitored catchments with more comprehensive data than just discharge, precipitation, and weather, the CHOSEN dataset seeks to facilitate the understanding of hydrological processes, development of simulation models, and effective management of catchments and ecosystems spanning diverse environmental conditions.