Interspecific adaptive variation
For the pcadapt analysis, we considered the first two PCs that
explained a total of ~16% of the genetic variance
(Figure S2A, Supporting information). The first PC explained
~10% of variance and clearly separated NIDGS from
SIDGS, and some variation within SIDGS, while the remaining axes mostly
reflected intraspecific variation within NIDGS (Figures S2B and S2C,
Supporting information). Using a value of α = 0.01, we detected
48 outlier loci using an adjustment of the p -values following the
Benjamini-Hochberg procedure.
For the GEAs, most environmental variables were not normally
distributed, and p -values for the Shapiro-Wilk normality were all
lower than 0.001 for all variables. Based on the results of the Kendall
correlation analysis, we kept 18 variables for the GEA analyses, given
that these showed low correlation with one another (Figure S3A,
Supporting information). The PCA performed with these variables shows
that only PC1 explained more variance than the random broken stick
criterion, and thus was kept as a summary of the environmental data to
compare in the LFMM univariate analysis (Figure S4A, Supporting
information). The PCA performed for the genomic data showed that only
PC1 explained more variance than the random broken stick criterion,
which suggests that the number of populations was two (Figure S3B,
Supporting information). We ran the LFMM using K = 2, which
resulted in a distribution of unadjusted p -values with a GIF of
2.42, which was then adjusted to 1.4 (Figure S5, Supporting
information). Using a false discovery rate (FDR) of 10%, the LFMM
analysis identified 52 candidate loci.
We then performed the RDA and pRDA for the IDGS dataset. For both
analyses, from the 18 environmental variables used, we excluded three
(BIO1, BIO2 and LF_VA) due to high VIF. We obtained an adjustedr2 of 0.08 and 0.06 for the RDA and pRDA,
respectively, across 15 axes, which correspond to the proportion of
genetic variance explained by the environmental predictors kept in each
analysis. The ANOVA showed that both analyses were significant atp < 0.001, and that the first four axes of both
analyses were significant at p < 0.001 (Figure S6,
Supporting information). Imputation of the missing data appeared to not
have created biases in population structure given the similarities
between the pcadapt and the RDA PCA loadings (Figures 2A and 2B,
and S2B and S2C, Supporting information). The first RDA axis explained
4.8% of the variance, while RDA2, RDA3 and RDA4 explained 2.8%, 2.5%
and 2.2%, respectively. The remaining axes combined (RDA5-15) explained
the remaining 13.5% of the total variance (Figure 2A and 2B). In this
analysis, there was a clear separation of NIDGS and SIDGS individuals in
RDA1, while RDA2-4 mostly separated populations within species. The
environmental variables that loaded the most strongly with RDA1, and
thus the separation between NIDGS and SIDGS, were ‘Steep Slopes’
(LF_SS, associated with 20 SNPs), ‘Grassland/Herbaceous’ (LC_GH,
associated with 32 SNPs) and ‘Soil Temperature’ (soilPartSize,
associated with 20 SNPs) (Table 2 and Figure 2A). When accounting for
population structure with the pRDA analysis, we no longer find a clear
distinction between NIDGS and SIDGS, and instead variation appears to
mimic that of the RDA without RDA1 (Figure 2). The first pRDA axis
explained 2.9% of the variance (similarly to RDA2), while pRDA2, pRDA3
and pRDA4 explained 2.6%, 2.3% and 2.1%, respectively. The remaining
axes combined (RDA5-15) explained the remaining 12.7% of the total
variance (Figure 2C and 2D). We also found a total of 316 SNPs
associated with 17 environmental variables, with 38.6% (122 SNPs) also
associated with the land formation variable ‘Ridges and Peaks’ (LF_RP)
(Table 2).
Considering all analyses combined, we identified a total of 490
candidate loci, none of which were found by all four analyses (Figure
4). Comparing the population structure outlier approach (pcadapt )
to the GEAs (LFMM, RDA and pRDA), there are 29 loci found by all GEAs
vs. 18 found only by the population structure outlier approach (Figure
4). Methods that do not account (i.e. correct) for population structure
(pcadapt and RDA) identified 21 loci in common, while methods
that account for population structure (LFMM and pRDA) identified one
locus in common. The largest overlap between analyses was 168 loci found
by both multivariate GEAs (RDA and pRDA).