Grammar-based Fuzzing of Data Integration Parsers in Computational
Materials Science
Abstract
Context: Computational materials science (CMS) focuses on
in silico experiments to compute the properties of known and
novel materials, where many software packages are used in the community.
The NOMAD Laboratory1 offers to store the input and output files in its
FAIR data repository. Since the file formats of these software packages
are non-standardized, parsers are used to provide the results in a
normalized format. Objective: The main goal of this article is
to report experience and findings of using grammar-based fuzzing on
these parsers. Method: We have constructed an input grammar for
four common software packages in the CMS domain and performed an
experimental evaluation on the capabilities of grammar-based fuzzing to
detect failures in the NOMAD parsers. Results: With our
approach, we were able to identify three unique critical bugs concerning
the service availability, as well as several additional syntactic,
semantic, logical, and downstream bugs in the investigated NOMAD
parsers. We reported all issues to the developer team prior to
publication. Conclusion: Based on the experience gained, we can
recommend grammar-based fuzzing also for other research software
packages to improve the trust level in the correctness of the produced
results.