Bioinformatics and statistical analyses
Bioinformatics analyses followed the pipeline used in He et al . (2019). Briefly, AdapterRemoval v2 was used to remove adapters in raw sequences (Schubert, Lindgreen, & Orlando, 2016). Sequences were demultiplexed using thengsfilter command of obitools, and sequences for each sample were split using the obisplit command (Boyer et al., 2016). Unpaired reads for each sample were removed using modified fastqCombinePairedEnd.py (https://github.com/enormandeau/Scripts/blob/master/fastqCombinePairedEnd.py). Paired sequences of each MiSeq run were imported into QIIME 2 separately (Bolyen et al., 2018). Sequence quality control was conducted and the amplicon sequence variant (ASV) tables were constructed using the DADA2 plugin of QIIME 2 (Callahan et al., 2016). Sequences from each MiSeq run were processed separately by DADA2, and the three resulting ASV tables were merged.
De novo OTU clustering based on the ASV count table and sequences was performed at 97% identity using the vsearch plugin of QIIME 2 (Rognes, Flouri, Nichols, Quince, & Mahé, 2016). The 97% identity was used as it has been shown to be a good cut-off for metazoans from a mock community analysis (Fonseca et al., 2010). Although using ASVs can retain more biological information (Callahan, McMurdie, & Holmes, 2017), we did OTU clustering to: 1) allow a valid comparison between the number of taxa detected using eDNA and morpho-taxonomy; 2) reduce the number of features to make it easier to manually verify if they were derived from benthic metazoan taxa or not; and 3) identify indicator taxa at a higher level than the ASV level. In the raw OTU table, the average number of reads in sediment samples was 168,454 while the average number of reads in controls (DNA extraction and PCR negative controls) was 23. Controls were then filtered out from the OTU table using the feature-table plugin. The OTUs with less than 10 reads or present in only one sample were discarded to generate the all-community OTU table (OTUA). Taxonomy assignment was performed using the q2-feature-classifier plugin of QIIME2 and the Silva database v132 (Bokulich, Kaehler, et al., 2018; Quast et al., 2013). Based on taxonomy assignment results (Table S2), non-metazoan OTUs (i.e. not assigned toD_3__Metazoa (Animalia) ) and two OTUs (one Atlantic salmon OTU and one chicken OTU) that were considered direct farm products were removed from the all-community OTU table to generate the metazoan OTU table (OTUM). Further, OTUs that were considered non-benthic metazoans (e.g. pelagic) (Table S3) based on their role in oceanographic ecosystems (Sutherland, 1991; Sutherland, Leonard, & Taylor, 1992), including those identified in Lanzén et al. (2016), were removed from the metazoan OTU table to generate the benthic metazoan OTU table (OTUBM). Downstream analyses used the OTUBM table unless stated otherwise.
The feature sequences were used to generate a rooted phylogenetic tree using the align-to-tree-mafft-fasttree plugin of QIIME2. For alpha diversity analysis, rarefaction plotting was performed on the OTUBM table (Figure S2). The OTUBM table was then rarefied to 500 sequences per sample for estimation of two alpha diversity richness parameters: Faith’s Phylogenetic Diversity (Faith’s PD) (Faith, 1992) and Observed OTUs. Correlations between various biotic parameters (i.e. alpha diversity, Nematoda and Polychaeta OTU richness, relative abundance of Capitella capitata ) and pore-water sulphide concentrations were tested using Spearman’s correlation analyses as data transformation did not lead to normal distributions.
Linear mixed effect models were used to independently describe the relationships between distance from cage edge and two alpha diversity parameters (Faith’s PD and Observed OTUs) using the R packagelme4(Bates, Mächler, Bolker, & Walker, 2015). Distance was set as a (continuous) fixed effect and farm was stated as a random effect for each model using thelmer function call response_variable ~ distance + (1|farm) . Generalized linear mixed effect models based on Poisson distribution were used to independently describe the relationships between distance from cage edge and Nematoda OTU richness and Polychaeta OTU richness using the package lme4 . Distance was set as a (continuous) fixed effect and farm was stated as a random effect for each model using the glmer function callresponse_variable ~ distance + (1|farm) . Model fit assessed from QQ-plots of residuals showed the data generally conformed to the assumed distributions for all models, and only the Nematode OTU richness deviated very slightly.
Three separate analyses were performed to identify OTUs associated with benthic impacts of salmon aquaculture. First, a one-way ANOVA was performed to analyze the difference in relative abundance of each OTU in the OTUBM table among organic enrichment statuses using the STAMP software, with P values corrected using the Benjamini-Hochberg (BH) method (Benjamini & Hochberg, 1995; Parks, Tyson, Hugenholtz, & Beiko, 2014). Second, the OTUBM table was transformed to a presence/absence table, and the Pearson’s phi coefficient of association was calculated to elucidate associations of particular OTUs with sediments of a particular organic enrichment status using the multipatt function of the R package indicspecies with the func set ’r.g’, dulegset ’TRUE’ and nperm set ’9999’ (Cáceres & Legendre, 2009). Again, these P values were corrected using the BH method. Third, supervised machine learning was performed using theq2-sample-classifier plugin of QIIME2 (Bokulich, Dillon, et al., 2018), with the random forest method and–p-n-estimators set to 1000. For machine learning, the OTUBM table and organic enrichment status based on pore-water sulphide concentration for each sample were used to investigate how accurately eDNA data could predict organic enrichment status and the relative importance score of each OTU for this prediction.