dnabarcoder: an open-source software package for analyzing and
predicting DNA sequence similarity cut-offs for fungal sequence
identification
- Duong Vu,
- Henrik Nilsson,
- Gerard Verkley
Abstract
The accuracy and precision of fungal molecular identification and
classification are challenging, particularly in environmental
metabarcoding approaches as these often trade accuracy for efficiency
given the large data volumes at hand. In most ecological studies, only a
single similarity cut-off value is used for sequence identification.
This is not sufficient since the most commonly used DNA markers are
known to vary widely in terms of inter- and intra-specific variability.
We address this problem by presenting a new tool, dnabarcoder, to
analyze and predict different local similarity cut-offs for sequence
identification for different clades of fungi. For each similarity
cut-off in a clade, a confidence measure is computed to evaluate the
resolving power of the genetic marker in that clade. Experimental
results showed that when analyzing a recently released filamentous
fungal ITS DNA barcode dataset of CBS strains from the Westerdijk Fungal
Biodiversity Institute, the predicted local similarity cut-offs varied
immensely between the clades of the dataset. In addition, most of them
had a higher confidence measure than the global similarity cut-off
predicted for the whole dataset. When classifying a large public fungal
ITS dataset -- the UNITE database - against the barcode dataset, the
local similarity cut-offs assigned fewer sequences than the traditional
cut-offs used in metabarcoding studies. However, the obtained accuracy
and precision were significantly improved.