Comparison between long and short amplicon reads
Amplicon length influenced bacterial diversity and ecological inference. When only V4 regions were considered, 174 unique ASVs were detected (636 fewer than the full-length 16S dataset at this step), 43 of which were classified as chloroplast or mitochondria and removed. Finally, 17 samples containing fewer than 500 sequences were removed, resulting in a final V4 dataset containing 93 samples with a median of 4,076 sequences per sample. Reduced sequence length in the V4 region resulted in the clustering of sequence variants from the full-length dataset: ASVs that in the full-length dataset distinguished host species or geographic location were considered a single ASV in the V4 dataset (Supplementary Figure S7-S9). However, despite a lower number of sequence, beta diversity inference was quite similar when either full-length or V4 regions was considered, with interactions among sex and sampling location less pronounced in the V4 dataset (Supplementary Tables S1-S4).
In addition, V4-region phylogenies contain fewer ASVs, most of which are found at high abundance in multiple host bee species, masking species-level differentiation in strains revealed by the full-length 16S region (Supplementary Figure S7, S8).