Comparison between long and short amplicon reads
Amplicon length influenced bacterial diversity and ecological inference.
When only V4 regions were considered, 174 unique ASVs were detected (636
fewer than the full-length 16S dataset at this step), 43 of which were
classified as chloroplast or mitochondria and removed. Finally, 17
samples containing fewer than 500 sequences were removed, resulting in a
final V4 dataset containing 93 samples with a median of 4,076 sequences
per sample. Reduced sequence length in the V4 region resulted in the
clustering of sequence variants from the full-length dataset: ASVs that
in the full-length dataset distinguished host species or geographic
location were considered a single ASV in the V4 dataset (Supplementary
Figure S7-S9). However, despite a lower number of sequence, beta
diversity inference was quite similar when either full-length or V4
regions was considered, with interactions among sex and sampling
location less pronounced in the V4 dataset (Supplementary Tables S1-S4).
In addition, V4-region phylogenies contain fewer ASVs, most of which are
found at high abundance in multiple host bee species, masking
species-level differentiation in strains revealed by the full-length 16S
region (Supplementary Figure S7, S8).