(C) Bioinformatics
Preliminary processing and filtering of raw full-length 16S rRNA reads
into Amplicon Sequence Variants (ASVs) was performed in R v4.1.0 (R Core
Team, 2021) using DADA2 (v1.20.0) (Callahan et al. 2019). Primers (F27 =
AGRGTTYGATYMTGGCTCAG; R1492 = AAGTCGTAACAAGGTARCY) were removed, and
reads were filtered by size and quality to yield sequences ranging from
1000 – 1600 bp with no ambiguous bases, 2 maximum expected errors, and
a minimum quality score of 3. Filtered reads were then dereplicated, and
sequencing errors were inferred using the PacBioErrfun function and
removed. Chimeras were inferred with a minFoldParentOverAbundance value
of 3.5 and removed using sequence consensus as a method. Finally,
taxonomy was assigned using the BEExact database (Daisley and Reid 2021)
and SILVA v 138.1 (Quast et al. 2012); resulting taxonomy was nearly
identical (Supplementary File 1) and we present assignments from BEExact
below.
The ASV and taxonomy tables generated from the DADA2 pipeline outlined
above were merged with metadata using phyloseq (McMurdie and Holmes
2013). ASVs classified as chloroplast or mitochondria were removed and
only samples with greater than 500 total sequences, where sampling
curves saturated, were retained (Supplementary Figure S2). The
extraction control contained only a single sequence and was removed at
this step.