Joseph Hamman - 21DOCS Test Area

Joseph Hamman

Public Documents 4

Moving land models towards actionable science: A novel application of the Community T...

Yifan Cheng

and 7 more

February 19, 2022

The Arctic hydrological system is an interconnected system that is experiencing rapid change. It is comprised of permafrost, snow, glacier, frozen soils, and inland river systems. Permafrost degradation, trends towards earlier snow melt, a lengthening snow-free season, soil ice melt, and warming frozen soils all challenge hydrologic simulation under climate change in the Arctic. In this study, we provide an improved representation of the hydrologic cycle across a regional Arctic domain using a generalizable optimization methodology and workflow for the community. We applied the Community Terrestrial Systems Model (CTSM) across the US state of Alaska and the Yukon River Basin at 4-km spatial resolution. We highlight several potentially useful high-resolution CTSM configuration changes. Additionally, we performed a multi-objective optimization using snow and river flow metrics within an adaptive surrogate-based model optimization scheme. Four representative river basins across our study domain were selected for optimization based on observed streamflow and snow water equivalent observations at ten SNOTEL sites. Fourteen sensitive parameters were identified for optimization with half of them not directly related to hydrology or snow processes. Across fifteen out-of-sample river basins, thirteen had improved flow simulations after optimization and the median Kling-Gupta Efficiency of daily flow increased from 0.40 to 0.63. In addition, we adapted the Shapley Decomposition to disentangle each parameter’s contribution to streamflow performance changes, with the seven non-hydrological parameters providing a non-negligible contribution to performance gains. The snow simulation had limited improvement, likely because snow simulation is influenced more by meteorological forcing than model parameter choices.

The Pangeo Platform: a community-driven open-source big data environment

Joseph Hamman

and 10 more

January 14, 2020

In this presentation, we will describe the [Pangeo Project](http://pangeo.io), a coordinated community effort with support from NASA, NSF, AWS, Microsoft Azure and Google Cloud, to develop interactive and reproducible open source workflows for discovery, visualization, and quantitative analysis of large datasets used for research in the Earth Sciences. The Pangeo computational platform is based on JupyterHub and deployed wherever the data is stored. Python libraries such as Xarray, Rasterio, and Dask enable distributed parallel computations on HPC and Kubernetes clusters. We will discuss the design concepts central to the Pangeo platform and highlight specific applications using NASA satellite data archives on AWS. We will discuss recent progress in the integration of data discovery tools (e.g. STAC, CMR, Intake) with cloud-native storage formats for multidimensional data types (Cloud-Optimized Geotiff, Zarr, etc.) and highlight how they can be used to construct elegant, robust and reproducible scientific workflows. Finally, we will discuss performance, security, transferability across public cloud platforms, cost to operate, and approaches to encourage a cultural shift in scientific computation through educational events.

Intake / Pangeo Catalog: Making It Easier To Consume Earth’s Climate and Weather Data

Anderson Banihirwe

and 3 more

February 05, 2022

Computer simulations of the Earth’s climate and weather generate huge amounts of data. These data are often persisted on HPC systems or in the cloud across multiple data assets of a variety of formats (netCDF, zarr, etc…). Finding, investigating, loading these data assets into compute-ready data containers costs time and effort. The data user needs to know what data sets are available, the attributes describing each data set, before loading a specific data set and analyzing it. In this notebook, we demonstrate the integration of data discovery tools such as intake and intake-esm (an intake plugin) with data stored in cloud optimized formats (zarr). We highlight (1) how these tools provide transparent access to local and remote catalogs and data, (2) the API for exploring arbitrary metadata associated with data, loading data sets into data array containers. We also showcase the Pangeo catalog, an open source project to enumerate and organize cloud optimized climate data stored across a variety of providers, and a place where several intake-esm collections are now publicly available. We use one of these public collections as an example to show how an end user would explore and interact with the data, and conclude with a short overview of the catalog’s online presence.

Scikit-downscale: an open source Python package for scalable climate downscaling

Joseph Hamman

and 1 more

July 22, 2021

Climate data from Earth System Models are increasingly being used to study the impacts of climate change on a broad range of biogeophysical (forest fires, fisheries, etc.) and human systems (reservoir operations, urban heat waves, etc.). Before this data can be used to study many of these systems, post-processing steps commonly referred to as bias correction and statistical downscaling must be performed. “Bias correction” is used to correct persistent biases in climate model output and “statistical downscaling” is used to increase the spatiotemporal resolution of the model output (i.e. 1 deg to 1/16th deg grid boxes). For our purposes, we’ll refer to both parts as “downscaling”. In the past few decades, the applications community has developed a plethora of downscaling methods. Many of these methods are ad-hoc collections of post processing routines while others target very specific applications. The proliferation of downscaling methods has left the climate applications community with an overwhelming body of research to sort through without much in the form of synthesis guiding method selection or applicability. Motivated by the pressing socio-environmental challenges of climate change – and with the learnings from previous downscaling efforts in mind – we have begun working on a community-centered open framework for climate downscaling: scikit-downscale. We believe that the community will benefit from the presence of a well-designed open source downscaling toolbox with standard interfaces alongside a repository of benchmark data to test and evaluate new and existing downscaling methods. In this notebook, we provide an overview of the scikit-downscale project, detailing how it can be used to downscale a range of surface climate variables such as air temperature and precipitation. We also highlight how scikit-downscale framework is being used to compare existing methods and how it can be extended to support the development of new downscaling methods.