Statistical analyses
Except when specified, we performed the following statistical analyses
at sampling site level, after integrating the community tables of the
two soil layer samples (leaf litter and deep soil) for all three
taxonomic groups (Acari, Collembola and Coleoptera) into a single
matrix. Each of the analyses was performed at both ASV and OTU levels.
α diversity and community uniqueness:
We used ANOVAs to test for significant differences in α diversity (RICH)
and community uniqueness (LCBD, local contribution to β diversity)
between forest habitats and soil layers. We used generalized linear
mixed models (GLMMs) to analyse the relationship between RICH or LCBD
per site and the topoclimatic variables (ENVPC1 and
ENVPC2) as predictors, with latitude and longitude as
covariates. We built GLMMs fitting forest habitat type as a random
effect in order to account for non-independence among samples from the
same forest habitat (see Supplemental Information).
β diversity:
The clustering of sampling sites according to their community
composition was visualised using non-metric multidimensional scaling
(NMDS) assuming two dimensions (k =2). Differences among forest
habitat types were tested for significance using permutational
multivariate analyses of variance (PERMANOVA). We used symmetric
Procrustes analyses to statistically assess non-randomness among NMDS
ordinations calculated from different community dissimilarity matrices
(βSOR vs. βSIM and ASVs vs. OTUs; see Peres-Neto & Jackson, 2001). These analyses were
performed using the vegan (Oksanen et al., 2020) and pairwiseAdonis (Martinez Arbizu, 2020) R packages.
The effect of forest habitat type and of the spatial and topoclimatic
predictors on β diversity patterns was tested using distance-based
redundancy analyses (dbRDA; Legendre & Anderson, 1999), at both
across-habitats and within-habitat scales. Specifically, the response
variables were the community dissimilarity matrices based on the Simpson
dissimilarity index (βSIM), and the explanatory
variables were forward selected from full models containing the
following sets of predictors: (i) forest habitat type (HAB); (ii)
spatial variables (SPAPCNMi) derived from the
transformation of the topographic weighted distance matrix using
Principal Coordinates of Neighbour Matrices (PCNM, see Supplemental
Information), and (iii) topoclimatic variables (ENVPC1and ENVPC2). The best-fit model was selected after
ensuring there were no issues of multicollinearity (Variance Inflation
Factors, VIF <10; e.g., Tonkin, Stoll, Jähnig, &
Haase, 2016). Finally, the best-fit models for the across-habitat
analyses were used to partition the variance explained exclusively by
each variable group (forest habitat type, space and topoclimate) and
their intersections using the adjusted coefficient of determination
(R 2ADJ) (e.g., Zinger et
al., 2019). These analyses were performed using the BiodiversityR (Kindt & Coe, 2005) and vegan R packages.
In order to validate the dbRDA inferences (Jupke & Schäfer, 2020), we
also applied multivariate generalized linear models (mvGLMs) as
implemented in the mvabund R package (Wang, Naumann,
Wright, & Warton, 2012). Unlike dbRDA which is a distance-based
approach, this model-based method fits a separate GLM per species and
performs resampling-based hypothesis testing for community-level effects
of predictors. Here, incidence (presence/absence) datasets were used as
input and tested against the same sets of predictors as in dbRDA. Models
were fitted using a binomial error distribution and a log link function.
We first assessed the predictor significance using single-term models
and only those predictors showing a significant effect were used to
assemble a full model. The best-fit model was built following a backward
stepwise selection approach (e.g., How, Cowan, Teale, & Schmitt,
2019). Term significance was assessed using likelihood ratio tests,
PIT-trap resampling and 999 bootstrapping iterations (Wang et al.,
2012).
Finally, we assessed the effect of habitat connectivity on the community
composition of the Qa sampling sites by applying matrix
regressions with randomization (MRR; Wang, 2013). Specifically, the
community dissimilarity matrices based on the Simpson dissimilarity
index (βSIM) were used as response variables, while the
explanatory variables were selected among: (i) resistance due to habitat
fragmentation (FRAIBR), (ii) resistance due to
topographic complexity (TRIIBR), (iii) resistance due to
a “flat landscape” (NULLIBR), (iv) weighted
topographic (SPATWD) distances and (v) topoclimatic
(ENVPC1-2) distances. We assembled a full model
including all explanatory matrices and built the best-fit model
following a backward stepwise selection approach, using 999 permutations
for the significance tests (e.g., Ortego, Gugger, & Sork, 2015).
The unique contribution of each predictor to the total variance
explained by the best-fit model was quantified using hierarchical
variance partitioning analysis in the hier.part R package (Walsh & Mac Nally, 2003). Once the best-fit model was
selected, we visualised the relationship between community similarity
(1-βSIM) and the explanatory distance/resistance
matrices and fitted a distance-decay of community similarity curve
(Gómez-Rodríguez & Baselga, 2018; Nekola & White, 1999). Specifically,
we fitted a negative exponential function to univariate GLMs assuming a
Gaussian error distribution and a log link function as implemented in
the betapart R package.