Comparison of short and long-read metabarcoding sequencing: challenges
and solutions for plastid read removal and microbial community
exploration of seaweed samples
Erwan Legeay
Genomer Platform, FR2424, Station Biologique de Roscoff, Sorbonne Université, CNRS, 29680 Roscoff, France & Adaptation and Diversity in the Marine Environment (UMR 7144), Station Biologique de Roscoff, Sorbonne Université, CNRS, 29680 Roscoff, France
Author ProfileAbstract
Short-read metabarcoding analysis is the gold standard to access to partial 16S and ITS genes with high read quality. With the advent of long-read sequencing, the amplification of full-length target genes is possible but with low read accuracy. Moreover, the amplification of 16S rDNA genes in seaweed or plant samples results in a large proportion of plastid reads, which are directly or indirectly derived from cyanobacteria. Primers designed not to amplify plastid sequences are available for short-read sequencing, while Oxford Nanopore Technology offers adaptive sampling, a unique way to remove reads in real-time. In this study, we compare three options to address the plastid read issue: deleting plastid reads with adaptative sampling, using optimized primers with Illumina MiSeq technology, and sequencing large numbers of reads with Illumina NovaSeq technology with universal primers. We showed that adaptive sampling using default settings of the MinKNOW software was ineffective for plastid depletion. We also demonstrated with a mock community that the SAMBA workflow provided the most accurate taxonomic assignment at the bacterial genus level compared to the IDTAXA and KRAKEN2 pipelines, but many false positives were generated at species level. Although NovaSeq sequencing with universal primer stood out for studying the algal bacterial community due to its deep coverage, the inclusion of eukaryotes and bacteria in the same sequencing run, and the low error rate. The combination of Illumina and ONT sequencing helped us explore the fungal diversity and allowed for the retrieval taxonomic information for genera poorly represented in the sequence databases.