Steven McNulty - 21DOCS Test Area

The frontier of wildfire-related risk assessment is moving into data science territory, and with good reason. Computational statistics, built on a foundation of high resolution remote sensing data, ground data, and theory, forms the basis of powerful risk assessment tools. The need for data based risk assessment has increased in past years, in view of longer wildfire seasons in the U.S., associated with more frequent droughts, more human ignitions and accumulating fuel loads. We present an application of machine learning (ML), which makes it possible to analyze complex data without a priori definition of interactions—this is a major advantage because these interactions are not known beforehand. Specifically, we build a stochastic gradient boosting machine (GBM) toolkit to assess the change in river flow after wildfire in the contiguous United States (CONUS) over a 5-year period. The GBM accounts for nonlinear relationships and interactions between wildland fire characteristics, watershed geometry, climate variability, topography and land cover. Building the GBM is a sequential process where a loss function is minimized at each fold, along a gradient defined by pseudo-residuals. This process allows the program to progressively learn more about how the variables in the large dataset interact to result in the response (i.e., river flow). Our results show that wildfires increase annual river flow in the CONUS when more than 20% of a gaged basin is burned. Data science tools like the GBM presented here, are essential in generating practical knowledge on how wildfire impacts on ecohydrology can ultimately affect hydrological services, socio-hydrosystems and water security in fire-affected regions.