2.6 Statistical analysis
All statistical analyses were performed in R v4.0.5 Venn diagrams were constructed usinghttp://bioinformatics.psb.ugent.be/webtools/Venn/. The effect of sampling day, sampling position, and temperature and discharge on overall species/OTU detection was tested using generalized additive models (GAM) with a poisson distribution and a “log” link function implemented by the gam() function in the R package mgcv (Wood, 2015). Where overdispersion was detected, we corrected the standard errors using a quasi-GAM model. Discharge and temperature were respectively quantified as the average discharge and mean hourly temperature of each sampling day (Table S1). To test for the effect of our predictor variable sampling position on our response variable OTU richness/read abundance a Kruskal-Wallis rank sum test was performed, as our data was non-normal and non-linear and we had three group variables. Further we divided our dataset into the two groups merolimnic and hololimnic taxa and subsetted these groups into the most abundant taxa, Diptera and EPT for merolimnic and Annelida and Coleoptera for hololimnic. Then we used GAMs to determine the effect of sampling day, temperature and discharge on species/OTU richness separately for each of the eight groups. To assess the effect of sampling day on the relative proportion for each FFG we used a GAM with a binomial distribution and a ”logit” link function. Only the term for sampling day was smoothed in all GAMS via cubic regression splines with a basis dimension k = 9. Model diagnostics were checked via plots of residual versus fitted values. For all models the chosen k was checked using gam.check() and no model had a significant p-value. The marginal effects, the slopes of the prediction equation, of the significant explanatory variables were plotted at the mean of the covariates to incorporate the possible minimal effects that the insignificant explanatory variables could potentially have on the slope of the significant explanatory variable.
An ANOSIM test was used to test for the effect of sampling position on OTU and species temporal beta-diversity (Jaccard distance) defined as the shift in the identities of named taxa in a specified assemblage over two or more time points (Magurran et al., 2019). Additionally, we partitioned temporal beta-diversity into its components turnover - reflecting replacement of species between samples - and nestedness - occuring when species loss or gain causes species-poor sites to resemble a strict subset of species-rich sites (Baselga, 2010) using the betapart package (Baselga et al., 2018) and visualized their change over the time series as pairwise comparisons in relation to the first timepoint. For comparisons between seasons, we grouped our dataset into spring (March, April, May), summer (June, July, August), autumn (September, October, November) and winter (December, January, February) samples. To visualize differences between sampling positions and seasons for species and OTU community composition, non-metric multidimensional scaling (NMDS) based on Jaccard’s distance was conducted using the package vegan (Oksanen et al., 2022).
To identify species that are most strongly associated with a season or only found at a specific season, an indicator species analysis based on presence - absence was performed using the R package indicspecies with the function multipatt() (De Cáceres, Legendre, Moretti, 2010). The association of species to seasons was calculated using the indicator value index from Dufrêne and Legendre 1997. We used the indicator value index instead of a correlation based index to get results which best match the observed presence of a species rather than preferences of one season over another. Using preferences could lead to the problem of species being reported as indicator for a specific season, although the species was also almost frequently present in other seasons. For biomonitoring aspects it is therefore favourable to use the observed presence of species. The indicator value index ranges between 0 and 1 (with 0 indicating no association) and is calculated as the product of two quantities A, the probability that the surveyed site belongs to the target group (i.e., season) when the species has been found, and B, how frequently the species is found in the different samples from each season. The function determines which season corresponds to the highest association value for each species and then tests the statistical significance of this relationship using a permutation test. We corrected for size differences between groups, as we had different sample sizes between seasons, using the implemented value within the multipatt() function.