Data analysis
Many reference databases use their own taxonomic nomenclature which can lead to conflicting taxonomy assignments when comparing multiple datasets (Canino et al ., 2021). To allow for comparisons between the microscopy and sedDNA records, taxonomy was homogenised using Phytool v2 (Canino et al ., 2021) which is based on the taxonomic classifications used in AlgaeBase (Guiry and Guiry, 2022). This ensured that taxa in both records were classified according to the same taxonomic nomenclature and names were updated to the current taxonomically accepted name.
To account for potential inaccuracies in species identification, taxa in both records were grouped at the genus level. As the counting method sometimes varied by size or form (e.g., single cell, colony, or filament), the microscopy-based counts were converted to a binary presence-absence value for each genus on each sampling occasion. The total number of sampling occasions on which each genus was observed was calculated for each year as a measure of occurrence, and then normalised to the number of sampling occasions per year to account for variable sampling effort.
Non-metric multidimensional scaling (NMDS) was performed based on a beta diversity Bray-Curtis dissimilarity matrix of genus relative abundance as measured using sedDNA and genus occurrence as measured by microscopy from 1945 to 2010. Correlations between each dissimilarity matrix and lake physicochemical conditions were assessed with a permutation test and fitted to the ordination space using the vegan R package v2.6-2 (Oksanen et al ., 2019). The vegan package was also used to calculate Shannon’s alpha diversity at the genus level in both records. Generalised additive models (GAMs) with Gamma error distributions and a log link were fitted to the temporal trend in alpha diversity using the mgcv R package v1.8-40 (Wood, 2020). As there was not a sediment sample corresponding to each year of the microscopy-based monitoring record, annual values of alpha diversity from 1945 to 2010 as measured by sedDNA were estimated using the GAM fitted to the temporal trend. These GAM-estimated annual values were then correlated with GAM-estimated annual values of alpha diversity as measured by microscopy using a model II regression with the lmodel2 R package v1.7-2 (Legendre et al ., 2018).
GAMs were fitted to the temporal trends in phylum relative abundance as measured by sedDNA using Beta error distributions with a logit link, which is suitable for proportion data. For the trends in phylum occurrence as measured by microscopy, GAMs were fitted using Gamma error distributions with a log link, which is suitable for positively skewed, non-negative data (Anderson et al ., 2010; Simpson, 2018). Restricted maximum likelihood (REML) was used as the smoothness selection method for all GAMs (Simpson, 2018). Annual values of relative abundance from 1945 to 2010 were estimated using the GAM fitted to the temporal trend and correlated with the GAM-estimated annual values of occurrence using a model II regression.
For each phylum, Venn diagrams were used to illustrate which genera were uniquely detected using sedDNA, which were uniquely detected by microscopy, and which were detected in both records. Venn diagrams were produced with the eulerr R package v7.0.0 (Larsson, 2022), and all data analysis was performed in R v4.2.1 (R Core Team, 2022).