Daniel Heydebreck

and 5 more

Within the AtMoDat project (Atmospheric Model Data, www.atmodat.de), a standard has been developed which is meant for improving the FAIRness of atmospheric model data published in repositories. Atmospheric model data form the basis to understand and predict natural events, including atmospheric circulation, local air quality patterns, and the planetary energy budget. Such data should be made available for evaluation and reuse by scientists, the public sector, and relevant stakeholders. Atmospheric modeling is ahead of other fields in many regards towards FAIR (Findable, Accessible, Interoperable, Reusable, see e.g. Wilkinson et al. (2016, doi:10.1101/418376)) data: many models write their output directly into netCDF or file formats that can be converted into netCDF. NetCDF is a non-proprietary, binary, and self-describing format, ensuring interoperability and facilitating reusability. Nevertheless, consistent human- and machine-readable standards for discipline-specific metadata are also necessary. While standardisation of file structure and metadata (e.g. the Climate and Forecast Conventions) is well established for some subdomains of the earth system modeling community (e.g. the Coupled Model Intercomparison Project, Juckes et al. (2020, https:doi.org/10.5194/gmd-13-201-2020)), other subdomains are still lacking such standardisation. For example, standardisation is not well advanced for obstacle-resolving atmospheric models (e.g. for urban-scale modeling). The ATMODAT standard, which will be presented here, includes concrete recommendations related to the maturity, publication, and enhanced FAIRness of atmospheric model data. The suggestions include requirements for rich metadata with controlled vocabularies, structured landing pages, file formats (netCDF), and the structure within files. Human- and machine-readable landing pages are a core element of this standard and should hold and present discipline-specific metadata on simulation and variable level.

Amandine Kaiser

and 3 more

Data maturity describes the degree of the formalisation/standardisation of a data object with respect to FAIRness and quality of the (meta-) data. Therefore, a high (meta-) data maturity increases the reusability of data. Moreover, it is an important topic in data management, which is reflected by a growing number of tools and theories trying to measure it, e.g. the FAIR testing tools assessed by RDA(1) or the NOAA maturity matrix(2). If the results of stewardship tasks cannot be shown directly in the metadata, reusers of data cannot easily recognise which data is easy to reuse. For example, the DataCite Metadata Schema does not provide an explicit property to link/store information on data maturity (e.g. FAIRness or quality of data/metadata). The AtMoDat project (3, Atmospheric Model Data) aims to improve the reusability of published atmospheric model data by scientists, the public sector, companies, and other stakeholders. These data are valuable because they form the basis to understand and predict natural events, including the atmospheric circulation and ultimately the atmospheric and planetary energy budget. As most atmospheric data has been published with DataCite DOIs, it is of high importance that the maturity of the datasets can be easily found in the DOI’s Metadata. Published data from other fields of research would also benefit from easily findable maturity information. Therefore, we developed a Maturity Indicator concept and propose to introduce it as a new property in the DataCite Metadata Schema. This indicator is generic and independent of any scientific discipline and data stewardship tool. Hence, it can be used in a variety of research fields. 1 https://doi.org/10.15497/RDA00034 2 Peng et al., 2015: https://doi.org/10.2481/dsj.14-049 3 www.atmodat.de

Anette Ganske

and 3 more

Even though in Earth System Sciences (ESS) the importance of good research data management has been widely discussed, the easy discoverability of quality-checked data has not yet been addressed in detail. This is the aim of the Earth System Data Branding (EASYDAB). EASYDAB is a branding to highlight FAIR and open data from Earth System Sciences that are published with DataCite DOIs. The EASYDAB guideline defines principles on how to achieve high metadata quality of ESS datasets by demanding specific metadata information. The EASYDAB logo is protected and may only be used by repositories that agree to follow the EASYDAB terms. The logo indicates that published data have an open licence, open file formats and rich metadata information. Quality controls by the responsible repository ensure that these conditions are met. For the control, the repository can choose between different approved quality guidelines such as e.g. the ATMODAT Standard, ISO 19115 or the OGC Geopackage Encoding Standard. Ideally, a quality guideline provides detailed mandatory and recommended specifications for rich metadata in the data files, the DataCite DOI and the landing page. One example of such a quality guideline is the ATMODAT standard, which has been developed specifically for atmospheric model data (AtMoDat project). In addition to the metadata specifications, it also demands controlled vocabularies, structured landing pages and specific file formats (netCDF). The ATMODAT standard includes checklists for data producers and data curators so that compliance with the requirements can easily be obtained by both sides. To facilitate an automated compliance check of the netCDF files metadata, a Python tool has also been developed and published. The automated checking of the quality principles enables a simplified control of the data by the repository. Nevertheless, repositories can also use checklists for the curation of the data. The overall aim of the curation of EASYDAB datasets shall always be the enhancement of the reuse of reviewed, high-quality data. Therefore, EASYDAB shows scientists the way to open and FAIR data while it enables repositories to indicate their efforts in publishing data with high maturity.

Carlo Lacagnina

and 9 more

The knowledge of data quality and the quality of the associated information, including metadata, is critical for data use and reuse. Assessment of data and metadata quality is key for ensuring credible available information, establishing a foundation of trust between the data provider and various downstream users, and demonstrating compliance with requirements established by funders and federal policies. Data quality information should be consistently curated, traceable, and adequately documented to provide sufficient evidence to guide users to address their specific needs. The quality information is especially important for data used to support decisions and policies, and for enabling data to be truly findable, accessible, interoperable, and reusable (FAIR). Clear documentation of the quality assessment protocols used can promote the reuse of quality assurance practices and thus support the generation of more easily-comparable datasets and quality metrics. To enable interoperability across systems and tools, the data quality information should be machine-actionable. Guidance on the curation of dataset quality information can help to improve the practices of various stakeholders who contribute to the collection, curation, and dissemination of data. This presentation introduces international community guidelines to curate data quality information that is consistent with the FAIR principles throughout the entire data life cycle and inheritable by any derivative product. Supportive case studies demonstrate the applicability of the proposed guidelines.

Vivien Voss

and 4 more

Micro-scale models are important to assess processes in complex domains, for example cities. The most common data standard for atmospheric model output data are the CF-conventions, a data standard for netCDF files, but this standard is not adapted to the model output of micro-scale models. As a part of the project AtMoDat (Atmospheric Model Data) we want to develop a model data standard for obstacle resolving models (ORM), including the additional variables (i.e. building structures, wall temperatures) used by these models. In order to involve the micro-scale modeller community in this process, a web based survey was developed and distributed in the modeller community via conferences and email. With this survey we want to find out which micro-scale ORMs are currently in use, their model specifics (e.g. used grid, coordinate system), and the handling of the model result data. Furthermore, the survey provides the opportunity to include suggestions and ideas, what we should consider in the development of the standard. Between September 2019 and July 2020, the survey was accessed 29 times, but only 12 surveys were completed. The finished surveys refer to eight different models and their corresponding model information. Results show that these different models use different output formats and processing tools, which results in different model result handling routines. The participants suggested to use the netCDF data format and to provide information on model initialization, model settings and model input along with the model output data. This would enable an easier intercomparison between different models and repetition of model simulations. Standardized model output and variable names would also enhance the development of shared routines for the analysis of micro-scale model data and a better findability of the data with search engines. This survey will remain open with regular assessments of contents (i.e. November 2020, May 2021; https://uhh.de/orm-survey).