(C) Bioinformatics
Preliminary processing and filtering of raw full-length 16S rRNA reads into Amplicon Sequence Variants (ASVs) was performed in R v4.1.0 (R Core Team, 2021) using DADA2 (v1.20.0) (Callahan et al. 2019). Primers (F27 = AGRGTTYGATYMTGGCTCAG; R1492 = AAGTCGTAACAAGGTARCY) were removed, and reads were filtered by size and quality to yield sequences ranging from 1000 – 1600 bp with no ambiguous bases, 2 maximum expected errors, and a minimum quality score of 3. Filtered reads were then dereplicated, and sequencing errors were inferred using the PacBioErrfun function and removed. Chimeras were inferred with a minFoldParentOverAbundance value of 3.5 and removed using sequence consensus as a method. Finally, taxonomy was assigned using the BEExact database (Daisley and Reid 2021) and SILVA v 138.1 (Quast et al. 2012); resulting taxonomy was nearly identical (Supplementary File 1) and we present assignments from BEExact below.
The ASV and taxonomy tables generated from the DADA2 pipeline outlined above were merged with metadata using phyloseq (McMurdie and Holmes 2013). ASVs classified as chloroplast or mitochondria were removed and only samples with greater than 500 total sequences, where sampling curves saturated, were retained (Supplementary Figure S2). The extraction control contained only a single sequence and was removed at this step.