BAGS: an automated Barcode, Audit & Grade System for DNA barcode
reference libraries.
Abstract
Biodiversity studies greatly benefit from molecular tools, such as DNA
metabarcoding, which provides an effective identification tool in
biomonitoring and conservation programmes. The accuracy of species-level
assignment, and consequent taxonomic coverage, relies on comprehensive
DNA barcode reference libraries. The role of these libraries is to
support species identification, but accidental errors in the generation
of the barcodes may compromise their accuracy. Here we present an
R-based application, BAGS (Barcode, Audit & Grade System), that
performs automated auditing and annotation of cytochrome c oxidase
subunit I (COI) sequences libraries, for a given taxonomic group of
animals, available in the Barcode of Life Data System (BOLD). This is
followed by implementing a qualitative ranking system that assigns one
of five grades (A to E) to each species in the reference library,
according to the attributes of the data and congruency of species names
with sequences clustered in Barcode Index Numbers (BINs). Our ultimate
goal is to allow researchers to obtain the most useful and reliable
data, highlighting and segregating records according to their
congruency. Different tests were performed to perceive its usefulness
and limitations. BAGS fulfils a significant gap in the current landscape
of DNA barcoding research tools by quickly screening reference libraries
to gauge the congruence status of data and facilitate the triage of
ambiguous data for posterior review. Thereby, BAGS have the potential to
become a valuable addition in forthcoming DNA metabarcoding studies, in
the long term contributing to globally improve the quality and
reliability of the public reference libraries.