Bioinformatics and statistical analyses
Bioinformatics analyses followed the pipeline used in
He et al .
(2019). Briefly, AdapterRemoval v2 was used to remove adapters in raw
sequences
(Schubert,
Lindgreen, & Orlando, 2016). Sequences were demultiplexed using thengsfilter command of obitools, and sequences for each sample were
split using the obisplit command
(Boyer et al.,
2016). Unpaired reads for each sample were removed using modified
fastqCombinePairedEnd.py
(https://github.com/enormandeau/Scripts/blob/master/fastqCombinePairedEnd.py).
Paired sequences of each MiSeq run were imported into QIIME 2 separately
(Bolyen et al.,
2018). Sequence quality control was conducted and the amplicon sequence
variant (ASV) tables were constructed using the DADA2 plugin of QIIME 2
(Callahan et al.,
2016). Sequences from each MiSeq run were processed separately by DADA2,
and the three resulting ASV tables were merged.
De novo OTU clustering based on the ASV count table and sequences
was performed at 97% identity using the vsearch plugin of QIIME
2 (Rognes, Flouri,
Nichols, Quince, & Mahé, 2016). The 97% identity was used as it has
been shown to be a good cut-off for metazoans from a mock community
analysis (Fonseca
et al., 2010). Although using ASVs can retain more biological
information
(Callahan,
McMurdie, & Holmes, 2017), we did OTU clustering to: 1) allow a valid
comparison between the number of taxa detected using eDNA and
morpho-taxonomy; 2) reduce the number of features to make it easier to
manually verify if they were derived from benthic metazoan taxa or not;
and 3) identify indicator taxa at a higher level than the ASV level. In
the raw OTU table, the average number of reads in sediment samples was
168,454 while the average number of reads in controls (DNA extraction
and PCR negative controls) was 23. Controls were then filtered out from
the OTU table using the feature-table plugin. The OTUs with less
than 10 reads or present in only one sample were discarded to generate
the all-community OTU table (OTUA). Taxonomy assignment
was performed using the q2-feature-classifier plugin of QIIME2
and the Silva database v132
(Bokulich,
Kaehler, et al., 2018; Quast et al., 2013). Based on taxonomy assignment
results (Table S2), non-metazoan OTUs (i.e. not assigned toD_3__Metazoa (Animalia) ) and two OTUs (one Atlantic salmon OTU
and one chicken OTU) that were considered direct farm products were
removed from the all-community OTU table to generate the metazoan OTU
table (OTUM). Further, OTUs that were considered
non-benthic metazoans (e.g. pelagic) (Table S3) based on their role in
oceanographic ecosystems
(Sutherland, 1991;
Sutherland, Leonard, & Taylor, 1992), including those identified in
Lanzén et al.
(2016), were removed from the metazoan OTU table to generate the benthic
metazoan OTU table (OTUBM). Downstream analyses used the
OTUBM table unless stated otherwise.
The feature sequences were used to generate a rooted phylogenetic tree
using the align-to-tree-mafft-fasttree plugin of QIIME2. For alpha
diversity analysis, rarefaction plotting was performed on the
OTUBM table (Figure S2). The OTUBM table
was then rarefied to 500 sequences per sample for estimation of two
alpha diversity richness parameters: Faith’s Phylogenetic Diversity
(Faith’s PD)
(Faith, 1992) and
Observed OTUs. Correlations between various biotic parameters (i.e.
alpha diversity, Nematoda and Polychaeta OTU richness, relative
abundance of Capitella capitata ) and pore-water sulphide
concentrations were tested using Spearman’s correlation analyses as data
transformation did not lead to normal distributions.
Linear mixed effect models were used to independently describe the
relationships between distance from cage edge and two alpha diversity
parameters (Faith’s PD and Observed OTUs) using the R packagelme4(Bates, Mächler,
Bolker, & Walker, 2015). Distance was set as a (continuous) fixed
effect and farm was stated as a random effect for each model using thelmer function call response_variable ~
distance + (1|farm) . Generalized linear mixed effect models
based on Poisson distribution were used to independently describe the
relationships between distance from cage edge and Nematoda OTU richness
and Polychaeta OTU richness using the package lme4 . Distance was
set as a (continuous) fixed effect and farm was stated as a random
effect for each model using the glmer function callresponse_variable ~ distance +
(1|farm) . Model fit assessed from QQ-plots of residuals showed
the data generally conformed to the assumed distributions for all
models, and only the Nematode OTU richness deviated very slightly.
Three separate analyses were performed to identify OTUs associated with
benthic impacts of salmon aquaculture. First, a one-way ANOVA was
performed to analyze the difference in relative abundance of each OTU in
the OTUBM table among organic enrichment statuses using
the STAMP software, with P values corrected using the
Benjamini-Hochberg (BH) method
(Benjamini &
Hochberg, 1995; Parks, Tyson, Hugenholtz, & Beiko, 2014). Second, the
OTUBM table was transformed to a presence/absence table,
and the Pearson’s phi coefficient of association was calculated to
elucidate associations of particular OTUs with sediments of a particular
organic enrichment status using the multipatt function of the R
package indicspecies with the func set ’r.g’, dulegset ’TRUE’ and nperm set ’9999’
(Cáceres &
Legendre, 2009). Again, these P values were corrected using the
BH method. Third, supervised machine learning was performed using theq2-sample-classifier plugin of QIIME2
(Bokulich, Dillon,
et al., 2018), with the random forest method and–p-n-estimators set to 1000. For machine learning, the
OTUBM table and organic enrichment status based on
pore-water sulphide concentration for each sample were used to
investigate how accurately eDNA data could predict organic enrichment
status and the relative importance score of each OTU for this
prediction.