Post-bioinformatic methods to identify and reduce the prevalence of
artefacts in metabarcoding data
Abstract
Metabarcoding provides a powerful tool for investigating biodiversity
and trophic interactions, but the high sensitivity of this methodology
makes it vulnerable to errors, resulting in artefacts in the final data.
Metabarcoding studies thus often utilise minimum sequence copy
thresholds (MSCTs) to remove artefacts that remain in datasets; however,
there is no consensus on best practice for the use of MSCTs. To mitigate
erroneous reporting of results and inconsistencies, this study discusses
and provides guidance for best-practice filtering of metabarcoding data
for the ascertainment of conservative and accurate data. The most common
MSCTs identified in the literature were applied to example datasets of
Eurasian otter (Lutra lutra) and cereal crop spider (Araneae:
Linyphiidae and Lycosidae) diets. Changes in both the method and
threshold value considerably affected the resultant data. Of the MSCTs
tested, it was concluded that the optimal method for the examples given
combined a sample-based threshold with removal of maximum taxon
contamination, providing stringent filtering of artefacts whilst
retaining target data. Choice of threshold value differed between
datasets due to variation in artefact abundance and sequencing depth,
thus studies should employ controls (mock communities, negative controls
with no DNA and unused MID-tag combinations) to select threshold values
appropriate for each individual study.