A pile of pipelines: an overview of the bioinformatics software for
metabarcoding data analyses
Abstract
Environmental DNA (eDNA) metabarcoding has gained growing attention as a
strategy for monitoring biodiversity in ecology. However, taxa
identifications produced through metabarcoding require sophisticated
processing of high-throughput sequencing data from taxonomically
informative DNA barcodes. Various sets of universal and taxon-specific
primers have been developed, extending the usability of metabarcoding
across archaea, bacteria, and eukaryotes. Accordingly, a multitude of
metabarcoding data analysis tools and pipelines have also been
developed. Often, several developed workflows are designed to process
the same amplicon sequencing data, making it somewhat puzzling to choose
one amongst the plethora of existing pipelines. However, each pipeline
has its own specific philosophy, strengths, and limitations, which
should be considered depending on the aims of any specific study, as
well as the bioinformatics expertise of the user. In this review, we
outline the input data requirements, supported operating systems, and
particular attributes of thirty-one amplicon processing pipelines with
the goal of helping users to select a pipeline for their metabarcoding
projects.