Improving Metabarcoding Taxonomic Assignment: A Case Study of Fishes in
a Large Marine Ecosystem
Abstract
DNA metabarcoding is an important tool for molecular ecology. However,
its effectiveness hinges on the quality of reference sequence databases
and classification parameters employed. Here we evaluate the performance
of MiFish 12S taxonomic assignments using a case study of California
Current Large Marine Ecosystem fishes to determine best practices for
metabarcoding. Specifically, we use a taxonomy cross-validation by
identity framework to compare classification performance between a
global database comprised of all available sequences and a curated
database that only includes sequences of fishes from the California
Current Large Marine Ecosystem. We demonstrate that the curated,
regional database provides higher assignment accuracy than the
comprehensive global database. We also document a tradeoff between
accuracy and misclassification across a range of taxonomic cutoff
scores, highlighting the importance of parameter selection for taxonomic
classification. Furthermore, we compared assignment accuracy with and
without the inclusion of additionally generated reference sequences. To
this end, we sequenced tissue from 605 species using the MiFish 12S
primers, adding 253 species to GenBankās existing 550 California Current
Large Marine Ecosystem fish sequences. We then compared species and
reads identified from seawater environmental DNA samples using global
databases with and without our generated references, and the regional
database. The addition of new references allowed for the identification
of 16 native taxa and 17.0% of total reads from eDNA samples, including
species with vast ecological and economic value. Together these results
demonstrate the importance of comprehensive and curated reference
databases for effective metabarcoding and the need for locus-specific
validation efforts.