Abstract
The EarthCube Data Discovery Studio (DDStudio) integrates several
technical components into an end-to-end data discovery and exploration
system. Beyond supporting dataset search across multiple data sources,
it lets geoscientists explore the data using Jupyter notebooks; organize
the discovered datasets into thematic collections which can be shared
with other users; edit metadata records and contribute metadata
describing additional datasets; and examine provenance and validate
automated metadata enhancements. DDStudio provides access to 1.67
million metadata records from 40+ geoscience repositories, which are
automatically enhanced and exposed via standard interfaces in both
ISO-19115 and in schema.org markup; the latter can be used by commercial
search engines (Google, Bing) to index DDStudio content. For geoscience
end users, DDStudio provides a custom Geoportal-based user interface
which enables spatio-temporal, faceted, and full-text search, and
provides access to additional functions listed above. Key project
accomplishments over the last year include: - User interface
improvements, based on design advice from a Science Gateways Community
Institute (SGCI) usability team, who conducted user interviews,
performed usability testing, and analyzed a dozen of other search
portals to identify the most useful features. This work resulted in a
streamlined user interface, particularly in presentation of search
results and in management of thematic collections. - The earlier effort
to publish DDStudio content using schema.org markup resulted in
significant usage increase. With over 900K records indexed by Google,
nearly half of the roughly 1000 unique users per month are now accessing
DDStudio via referrals from Google. - The added ability to harvest and
process JSON-LD metadata makes it possible to integrate EarthCube
GeoCodes content into DDStudio, and work with this content using
DDStudio’s user interface. - New application domains include joint work
with the library community, and interoperation with DataMed, a similar
system that indexes 2.3 million biomedical datasets.