Using an ensemble of FAIR assessment approaches to inform the design of
future FAIRness testing: a case study evaluating World Data Center for
Climate (WDCC)-preserved (meta)data
Abstract
From a research data repositories’ perspective, offering data management
services in-line with the FAIR principles is becoming more and more of a
selling point to compete on the market. In order to do so, the services
offered must be evaluated and credited following transparent and
credible procedures. Several FAIRness evaluation methods are openly
available for being applied to archived (meta)data. However, there
exists no standardized and globally accepted FAIRness testing procedure
to date. Here, we apply an ensemble of 5 FAIRness evaluation approaches
to selected datasets archived in the WDCC. The selection represents the
majority of WDCC-archived datasets (by volume) and reflects the entire
spectrum of data curation levels. Two tests are purely automatic, two
are purely manual and one test applies a hybrid method (manual and
automatic combined) for evaluation. The results of our evaluation show a
mean FAIR score of 0.67 of 1. Manual approaches show higher scores than
automated ones. The hybrid approach shows the highest score. Computed
statistics show agreement between the tests at the data collection
level. None of the five evaluation approaches is fully fit-for-purpose
to evaluate (discipline-specific) FAIRness, but all have their merit.
Manual testing captures domain- and repository-specific aspects of FAIR.
Machine-actionability of archived (meta)data is judged by the evaluator.
Automatic approaches evaluate the machine-actionable features of
archived (meta)data. These have to be accessible by an automated agent
and comply with globally established standards. An evaluation of
contextual metadata (essential for reusability) is not possible.
Correspondingly, the hybrid method combines the advantages and
eliminates the deficiencies of manual and automatic evaluation. We
recommend that future operational FAIRness evaluation be based on a
mature hybrid approach. The automatic part of the evaluation would
retrieve and evaluate as much machine-actionable discipline specific
(meta)data content as possible and be then complemented by a manual
evaluation focusing on the contextual aspects of FAIR. Design and
adoption of the discipline-specific aspects will have to be conducted in
concerted community efforts. We illustrate a possible structure of this
process with an example from climate research.