Aashish Panta

and 4 more

Managing vast volumes of climate data, often reaching into terabytes and petabytes, presents significant challenges in terms of storage, accessibility, efficient analysis, and on-the-fly interactive visualization. Traditional data handling techniques are increasingly inadequate for the massive atmospheric and oceanic data generated by modern climate research. We tackled these challenges by reorganizing the native data layout to optimize access and processing, implementing advanced visualization algorithms like OpenVisus for real-time interactive exploration, and extracting comprehensive metadata for all available fields to improve data discoverability and usability. Our work utilized extensive datasets, including downscaled projections of various climate variables and high-resolution ocean simulations from NEX GDDP CMIP6 and NASA DYAMOND datasets. By transforming the data into progressive, streaming-capable formats and incorporating ARCO (Analysis Ready, Cloud Optimized) features before moving them to the cloud, we ensured that the data is highly accessible and efficient for analysis, while allowing direct access to data subsets in the cloud. The direct integration of the Python library called Xarray allows efficient and easy access to the data, leveraging the familiarity most climate scientists have with it. This approach, combined with the progressive streaming format, not only enhances the findability, shareability and reusability of the data but also facilitates sophisticated analyses and visualizations from commodity hardware like personal cell phones and computers without the need for large computational resources. By collaborating with climate scientists and domain experts from NASA Jet Propulsion Lab and NASA Ames Research Center, we published more than 2 petabytes of climate data via our interactive dashboards for climate scientists and the general public. Ultimately, our solution fosters quicker decision-making, greater collaboration, and innovation in the global climate science community by breaking down barriers imposed by hardware limitations and geographical constraints and allowing access to sophisticated visualization tools via publicly available dashboards.

Joseph Jacob

and 2 more

The need to better understand climate change has driven model simulations to greater fidelity with improved spatiotemporal resolution (e.g., < 10 km at sub-hourly cadence). For example, the 7 km GEOS-5 Nature Run (G5NR) with 30-minute outputs from 2005-07 at the NASA Center for Climate Simulation (NCCS) is ~4 PB and is not easily portable. The rise of these high-fidelity climate models coincides with the emergence of cloud computing as a viable platform for scientific analytics. NASA has adopted a cloud computing strategy using public providers like Amazon Web Services (AWS). However, it is not cost- or time- effective to move the High- Performance Computing (HPC)-based model computations and data to the cloud. Thus, there is a need for scalable model evaluation compatible with both the cloud and HPC platforms like NCCS. To fill this need we have extended the analytics component of the Apache Science Data Analytics Platform (SDAP) with a streamlined version that specifically targets high-resolution science data products and climate model outputs on a regular coordinate grid. Gridded inputs (as opposed to other data structures like point clouds or swath-based measurements supported by SDAP), enable offsets to particular grid cells to be directly computed, allow for processing on the original NetCDF or HDF granules, do not require a second tiled copy of the data, and accommodate a simpler technology stack since no geospatial database is required for lookups or tile storage. Our core module, Parmap, abstracts the map-reduce model so that users can select from a variety of map computational modes, including Spark, Dask, serverless AWS Lambda, PySparkling, and Python multiprocessing. Example analytics include area-averaged time series and time-averaged, correlation and climatological maps. Benchmarks compare favorably with the full SDAP implementation.