Data analysis
Many reference databases use their own taxonomic nomenclature which can
lead to conflicting taxonomy assignments when comparing multiple
datasets (Canino et al ., 2021). To allow for comparisons between
the microscopy and sedDNA records, taxonomy was homogenised using
Phytool v2 (Canino et al ., 2021) which is based on the taxonomic
classifications used in AlgaeBase (Guiry and Guiry, 2022). This ensured
that taxa in both records were classified according to the same
taxonomic nomenclature and names were updated to the current
taxonomically accepted name.
To account for potential inaccuracies in species identification, taxa in
both records were grouped at the genus level. As the counting method
sometimes varied by size or form (e.g., single cell, colony, or
filament), the microscopy-based counts were converted to a binary
presence-absence value for each genus on each sampling occasion. The
total number of sampling occasions on which each genus was observed was
calculated for each year as a measure of occurrence, and then normalised
to the number of sampling occasions per year to account for variable
sampling effort.
Non-metric multidimensional scaling (NMDS) was performed based on a beta
diversity Bray-Curtis dissimilarity matrix of genus relative abundance
as measured using sedDNA and genus occurrence as measured by microscopy
from 1945 to 2010. Correlations between each dissimilarity matrix and
lake physicochemical conditions were assessed with a permutation test
and fitted to the ordination space using the vegan R package v2.6-2
(Oksanen et al ., 2019). The vegan package was also used to
calculate Shannon’s alpha diversity at the genus level in both records.
Generalised additive models (GAMs) with Gamma error distributions and a
log link were fitted to the temporal trend in alpha diversity using the
mgcv R package v1.8-40 (Wood, 2020). As there was not a sediment sample
corresponding to each year of the microscopy-based monitoring record,
annual values of alpha diversity from 1945 to 2010 as measured by sedDNA
were estimated using the GAM fitted to the temporal trend. These
GAM-estimated annual values were then correlated with GAM-estimated
annual values of alpha diversity as measured by microscopy using a model
II regression with the lmodel2 R package v1.7-2 (Legendre et al .,
2018).
GAMs were fitted to the temporal trends in phylum relative abundance as
measured by sedDNA using Beta error distributions with a logit link,
which is suitable for proportion data. For the trends in phylum
occurrence as measured by microscopy, GAMs were fitted using Gamma error
distributions with a log link, which is suitable for positively skewed,
non-negative data (Anderson et al ., 2010; Simpson, 2018).
Restricted maximum likelihood (REML) was used as the smoothness
selection method for all GAMs (Simpson, 2018). Annual values of relative
abundance from 1945 to 2010 were estimated using the GAM fitted to the
temporal trend and correlated with the GAM-estimated annual values of
occurrence using a model II regression.
For each phylum, Venn diagrams were used to illustrate which genera were
uniquely detected using sedDNA, which were uniquely detected by
microscopy, and which were detected in both records. Venn diagrams were
produced with the eulerr R package v7.0.0 (Larsson, 2022), and all data
analysis was performed in R v4.2.1 (R Core Team, 2022).