AGU data citation community of practice - Credit for creators of data
within collections using the concept of a reliquary
Abstract
A gap in community practice on data citation that emerged during the AGU
fall meeting 2020 Data FAIR Town Hall, “Why Is Citing Data Still
Hard?” with the goal of addressing the use case of citing a large
number of datasets such that credit for individual datasets is assigned
properly. The discussion included the concept of a “Data Collection”
and the infrastructure and guidance still needed to fully implement the
capability so it is easier for researchers to use and receive credit
when their data are cited in this manner. Such collections of data may
contain thousands to millions of elements with a citation needing to
include subsets of elements potentially from multiple collections. Such
citations will be crucial to enable reproducible research and credit to
data and digital object creators. To address this gap, the data citation
community of practice formed including members from data centres,
research journals, informatics research communities, and data citation
infrastructure. The community has the goal of recommending an approach
that is realistic for researchers to use and for each stakeholder to
implement that leverages existing infrastructure. To achieve data
citation of these subsets of large data collections the concept of a
“reliquary” is introduced. In this context the reliquary is a
container of persistent identifiers (PIDs) or references defining the
objects used in a research study. This can include any number of
elements. The reliquary can then be cited as a single entity in academic
publications. The reliquary concept will enable data citation use cases
such as the citation of elements within a data collection that are
formed from numerous underlying datasets that have their own PIDs,
unambiguous citation of data used in IPCC Assessment Reports, and citing
the subsets of collections of research data that contain millions of
elements. The discussions over the course of 2021 have developed a
theoretical concept, at the time of writing formal use cases and initial
applications are being defined. The recommendation developed by this
effort will be available for review and comment by communities such as
ESIP and RDA. All are welcome.