MATERIALS AND METHODS
Study species: Balsam poplar (Populus balsamifera L.,
Salicaceae) is a common deciduous tree in northern temperate and boreal
forest ecosystems across North America. Its expansive geographic range
encompasses broad climatic gradients in growing season length, and
populations exhibit abundant clinal variation and adaptive population
divergence in allele frequencies and phenotypic traits, particularly
those related to phenology and ecophysiology
(Fitzpatrick
& Keller, 2015; Keller, Levsen, Ingvarsson, Olson, & Tiffin, 2011;
Keller, Levsen, Olson, & Tiffin, 2012; Keller, Soolanayakanahally, et
al., 2011; Olson et al., 2013; Soolanayakanahally, Guy, Silim, Drewes,
& Schroeder, 2009; Soolanayakanahally, Guy, Silim, & Song, 2013).
Based on previous work, balsam poplar is known to exhibit relatively
strong regional population structure, with multiple studies recognizing
a clearly divergent subpopulation in the eastern part of its range in
Atlantic Canada and the New England States
(Chhatre et al.,
2019; Keller, Olson, Silim, Schroeder, & Tiffin, 2010; Meirmans,
Godbout, Lamothe, Thompson, & Isabel, 2017). The rest of the range
consists of a large, relatively homogenous genetic subpopulation in the
middle core of its range that is weakly differentiated
(F ST = 0.008) but with signatures of
isolation-by-distance along an axis of longitude
(Chhatre et al., 2019;
Meirmans et al., 2017). Lastly, some studies suggest a third major
subpopulation in northwestern Canada and Alaska
(Keller et al., 2010).
In the current study, we focused our sampling and analysis on the
widespread but weakly structured “Core” region in the middle of the
range (Chhatre et al.,
2019). We did this to minimize the known confounding introduced by
regional population structure to genome scans of selection
(Lotterhos & Whitlock,
2014), and because the Core region still captures large variation in
the climatic environment, making it ideal for studying adaptive
contributions of genomic variation while reducing the likelihood of
false positives due to demographic history
(Chhatre et al., 2019). Our
selection of individuals and populations was based on retaining
individuals with ADMIXTURE ancestry scores >0.9 in the Core
region based on Chhatre et
al. (2019). This resulted in 336 individuals from 42 populations for
analysis of genomic variation at 107,309 biallelic SNPs, with
populations distributed across north central US and Canada, from
Vermont, New York, and Quebec in the east to Manitoba and Saskatchewan
in the west, and south as far as WY (Table S1).
Common gardens: We measured performance of poplar clones in two
common garden locations with contrasting climates, “IH” near Indian
Head, Saskatchewan, Canada (50.52 °N, −103.68 °W), and “VT” near
Burlington, Vermont, USA (44.44 °N, −73.19 °W). Establishment of both
common gardens was from clonally propagated stem cuttings (6-9 cm with
at least 2 vegetative buds) taken from natural populations while dormant
during winter, rooted in potting media in the greenhouse and grown for 1
year, and then outplanted directly into the ground at the respective
garden site. Details of the IH site are provided in
Soolanayakanahally et al.
(2013), which describes the original garden established in 2005
consisting of 15 genotypes from each population planted in aggregate
with 2×2 meter spacing. Each population group was then clonally
replicated 5 times in random locations throughout the garden. In 2011, a
second planting at IH was established adjacent to the original garden,
consisting of additional populations planted with the same
specifications as the original planting. At the VT site, planting
consisted of a completely randomized design with a target replication of
3 ramets per genotype planted with 2×2 meter spacing. At both sites,
plants were not fertilized, but were watered as needed during the year
they were established and then received no supplemental water
thereafter. Additional details of the VT site can be found in
Fetter, Nelson, & Keller
(2019).
After losses due to mortality during establishment, plants available for
phenotyping totaled 357 unique genotypes (N=117 in IH; 337 in VT)
representing 41 of the 42 target populations (N=9 in IH; 41 in VT;
population ‘NIC’ was absent from both; Table S1). Most of these
genotypes with phenotype data were the same individuals sequenced with
GBS for the 107,309 SNPs (297 of 336 individuals with GBS; representing
41 of the 42 populations with GBS). As an overall metric of growth
performance, we measured the yearly height increment (cm) gain between
the apical bud and the previous year’s bud scar on the most dominant
stem. Height increment was measured at the end of the 2015 growing
season for both sites, after all plants had finished setting bud. This
gives an overall measure of growth achieved during the 2015 growing
season that integrates effects of genotypic variability in phenology and
relative growth rate
(Soolanayakanahally et al.,
2013). Genotypes with height increment data were represented by a mean
(SD) of 2.6 (1.5) ramets.
Environmental data for field locations : To characterize
environmental conditions at the sampling locations of the source
populations, we used a set of seven variables at 30-arcsecond resolution
(~1 km x 1 km). We selected this set of seven variables
from a larger (n =21) initial set composed of 19 bioclimatic
variables and elevation from WorldClim v1.4 (representing averages for
the period 1960-1990; www.worldclim.org;
Hijmans, Cameron, Parra,
Jones, & Jarvis, 2005), plus latitude. We used an iterative process of
GF model fitting to assess variable importance combined with variance
inflation factor (VIF) analysis to assess collinearity, with the goal of
retaining a final set of the most important and interpretable and least
collinear variables to the greatest extent possible. VIF estimates the
potential impact of multicollinearity in regression-type models by
quantifying the extent to which standard errors are inflated due to
collinearity compared to when uncorrelated variables are used. To
calculate VIFs, we used the ‘vifcor’ function with a correlation
threshold of 0.75 in the ‘usdm ’ package
(Naimi, Hamm, Groen,
Skidmore, & Toxopeus, 2014) in R
(R Core Team, 2018). The
final set of seven variables included latitude (y), elevation (elev) and
five bioclimatic variables: mean diurnal range (bio2), mean summer
(bio10) and winter (bio11) temperature, and summer (bio18) and winter
(bio19) precipitation. Latitude was included to characterize the
gradient in daylength not otherwise captured by the bioclimatic
variables. Of our seven retained variables, mean winter temperature
(bio11) and latitude were found to have a VIF greater than 10 (10.31 and
17.00 respectively), which is considered an approximate rule-of-thumb
cutoff VIF (Guisan, Thuiller,
& Zimmermann, 2017). However, given that GF has some ability to
accommodate correlated variables
(Ellis et al., 2012) and
daylength is an important determinant of phenology and height growth
cessation in poplars (Soolanayakanahally et al. 2013), we opted to
retain both of these variables in our models. We aggregated the
individual-level environmental data to the level of populations (42
total) by taking the mean across individuals within each population
(mean = 8 individuals per population, min=2, max=13).
Environmental data for common garden locations : To characterize
climatic conditions at the two common garden experiments, we used DayMet
(Thornton et al., 2014) data
for 2014 and 2015, which spanned the relevant period from planting and
establishment of the clonal replicates to when the growth measurements
were collected. The Daymet dataset provides gridded estimates of
temperature and precipitation for North America on a daily time step at
~1 km x 1 km spatial resolution. We aggregated the daily
data for 2014 and 2015 to produce monthly averages for maximum and
minimum temperature and monthly sums for precipitation. We used these
monthly summaries and the ‘biovars’ function in the ‘dismo ’
package (Chamberlain, 2017)
to calculate the same set of bioclimatic predictor variables used to
characterize climate for the field collections.
Univariate outlier detection methods: Full details of the GEA
selection scans are described in
Chhatre et al. (2019), and
recapped again briefly here. Namely, we used two complementary
approaches: (1) the population-based method bayenv2(Günther & Coop, 2013) that
corrects for population relatedness using a variance-covariance matrix
of allele frequencies before testing for a correlation between SNP
allele frequencies and an environmental predictor; and (2) the
individual-based method lfmm(Frichot, Schoville,
Bouchard, & François, 2013) that controls for background genetic
structure by introducing latent factors into a linear mixed models while
testing for associations between the genotypic state at each SNP and an
environmental predictor. Both bayenv2 and lfmm are
inherently univariate, testing one environmental predictor’s association
with one SNP at a time. Therefore, to avoid inflating type I error
rates, we used as environmental predictors in each approach the first 2
PC’s from a principal components analysis on 19 bioclim variables
(Hijmans et al., 2005) plus source latitude.
We generated empirical P -values for assessing significance in GEA
tests, as advocated by
Lotterhos & Whitlock
(2014), using a subset (n =1,353) of non-coding intergenic SNPs
based on the Populus trichocarpa v3.0 genome annotation. We used
these intergenic SNPs to generate empirical null distributions of the
test statistics used in our selection scans (Bayes Factors forbayenv2 and 𝝀-adjusted z-scores for lfmm ), based on
ranking each test SNP within the empirical null distribution and
determining its quantile sensuLotterhos & Whitlock
(2014). Candidate SNPs with a test statistic equal to or exceeding that
of the empirical null distribution were designated as selection outliers
with a P ≤ 0.000739 (=1/[1353+1]).
GF background: Ellis
et al. (2012) provide details regarding the Gradient Forest algorithm
and Fitzpatrick & Keller
(2015) describe the application of GF to modeling genomic data and
predicting genetic offsets. In brief, GF is a flexible, non-parametric
extension of the machine learning approach known as Random Forests
(Breiman, 2001). GF uses
Random Forest to fit an ensemble of regression trees to model change in
allele frequencies across sites and derive monotonic, nonlinear
functions of environmental predictors. GF stands on the shoulders of
Random Forests and inherits its robust statistical measures of model
performance and variable importance. The predictive performance of the
GF model for each SNP is quantified using the proportion of out‐of‐bag
data variance explained (R 2), which is a robust
estimate of generalization error. These robust goodness-of-fitR 2 values allow ranking of SNPs by how well the
environmental gradients explain changes in SNP allele frequencies and
can inform the detection of statistical outliers by selecting SNPs withR 2 values that exceed an empirical threshold
(see below). The accuracy importance of predictors is quantified as the
decrease in performance when each predictor is randomly permuted. For
correlated variables, a conditional approach can be used (see
Ellis et al., 2012). Lastly,
from the ensemble of regression models, GF determines how well
partitions distributed at numerous “split values” along each
environmental variable explain changes in allele frequencies on either
side of a split. The amount of variation explained by each split is the
‘raw split importance’. The empirical, nonlinear turnover functions are
constructed by distributing the R 2 values from
all SNPs among the predictor gradients in proportion to their accuracy
importance and along each gradient according to the density of the raw
split importance values. The split importance values for all modeled
SNPs also are aggregated to an overall, genome-wide turnover function
for each variable using weightings based on predictor importance and the
goodness-of-fit for each SNP model. Portions of gradients where split
importance is high emerge as thresholds where genetic change is rapid
(as might be expected between genetic groups). Gradients strongly
associated with genetic variation will have more important splits
therefore greater overall cumulative importance than gradients with
little biological relevance. Important gradients will also have greater
contribution to predicted genetic offsets.
GF outlier detection – Simulations: In previous work
(Fitzpatrick & Keller,
2015; Fitzpatrick, Keller, & Lotterhos, 2018), we have advocated the
use of GF to calculate genetic offsets on sets of outlier loci
pre-ascertained using established methods for outlier detection that
have been shown to have low false positive rates for the presumed
demographic history of the sample. Here, we were interested in assessing
the behavior of GF for outlier detection, given its strengths of
incorporating multivariate predictors, interactions between predictors,
and non-linear allele frequency gradients, but also its weakness of not
controlling for demographic history. Rather than producing a
comprehensive test of GF on an array of different demographic histories,
we instead focused on quantifying statistical power and the rate of
false positives under a demographic model of isolation-by-distance. We
chose this based on our sample of Core region populations that show
minimal genetic structure (F ST = 0.008) but weak
isolation by distance along an east-west gradient
(Chhatre et al., 2019;
Meirmans et al., 2017). We simulated population genetic data using
CDPOP v1.2.20 (Landguth &
Cushman, 2010) under a simple linear stepping stone model of 30 equal
sized demes arrayed along a 1-dimensional spatial gradient, with each
deme consisting of 20 unisexual individuals of equal sex ratio (total
size = 600 inds). Each diploid individual consisted of 1,000 unlinked
biallelic loci. To generate starting allele frequencies, we used the
coalescent to simulate a site frequency spectrum (SFS) under a neutral
Wright-Fisher model in ms(Hudson, 2002), removed loci
with minor allele frequencies <0.1, and then randomly drew
1,000 loci from the SFS to seed the initial frequencies for each deme.
Gene flow was simulated with a migration function that allowed adults to
disperse to nearby demes prior to mating each generation. Probability of
migration was a negative linear function of distance from the source
deme, with probability declining to zero at a number of demes (u )
distant from the source. To evaluate different levels of migration, we
chose u values of 2, 4 or 8 demes away, which resulted in meanF ST levels of 0.15, 0.04, and 0.008,
respectively. After migration, mating occurred randomly among members
within a deme, and females produced a Poisson distributed number of
offspring with a mean and variance (𝜆 value) = 5. Offspring genotypes
were assembled based on Mendelian inheritance at each locus and linkage
equilibrium between loci, with a probability of de-novo mutation
= 1e-8. For each migration scenario, we designated locus 1 as
experiencing one of three strengths of selection (weak: s = 0.01;
moderate: s = 0.1; or strong: s = 0.2). Linear viability
selection was applied whereby the probability of mortality decreased
linearly from s to 0 for the dominant homozygote and heterozygote
and increased linearly from 0 to s for the recessive homozygote
across the 1D landscape. Under the non-linear scenario, the fitness of
genotypes decayed or increased by the same magnitude, but followed an
exponential function defined by (a) slope of mortality decay and (b) they -intercept (Supplementary Figure S1). We also included a neutral
scenario for each migration level in which locus 1 experienced no
selection (s = 0). Thus, there were a total of 12 simulated
scenarios (3 migration rates x 4 selection strengths), which we
replicated independently 100 times per scenario. For each simulation
replicate, we obtained allele frequencies per deme from the
100th simulated generation and used these in GF to
identify outlier status using the same settings applied to empirical
data (see below). For scenarios where s > 0, we
calculated power as the proportion out of 100 simulation replicates
where locus 1 was identified as an outlier in the distribution ofR 2 values from GF for a given significance
level (range of 𝛼: 0.001 - 0.1). For the neutral scenario (s =
0), we calculated the false positive rate (𝛼 = 0.05) as the proportion
out of 100 simulation replicates where locus 1 was in the 95%
quantile of R 2 values from GF.
GF outlier detection – Empirical: In addition tobayenv2 and lfmm , we also used GF to select outlier SNPs
using two types of allele frequency estimates: (1) “raw” or
uncorrected allele frequencies (hereafter termed “GF-Raw”), and (2)
standardized allele frequencies (X ) output from bayenv2(Günther & Coop, 2013)
corrected using the estimated population genetic (co)variance matrix
(hereafter termed “GF-X ”). GF-Raw used direct estimates of
population minor allele frequencies based on our sample of N individuals
per population (mean N = 8; range: 2 - 13). The GF-X approach was
included to explore a more robust approach to allele frequency
estimation that corrects for finite sampling and population relatedness
prior to feeding into GF for outlier detection. To select GF-Raw
outliers, we fit GF to the raw minor allele frequencies for the 107,309
SNPs and obtained an R 2 for each locus. We
compared the distribution of the resulting R 2values to the empirical P -value derived from the 1,353 intergenic
SNPs. To select GF-X outliers, we first obtained standardized
allele frequencies (X ) estimated using the fitted omega matrix inbayenv2 as described above (cf . Univariate outlier
detection methods) and using the ‘-f’ flag to output 190 MCMC draws ofX for each candidate SNP. Following advice given in thebayenv2 manual, we then fit GF models separately to each of the
190 MCMC draws and extracted the resulting 190R 2 values for each of the 107,309 SNPs. From
these values, we calculated the median R 2 for
each candidate SNP, and determined empirical P- values by
determining their rank within the distribution of the medianR 2 values from the 1,353 intergenic SNPs.
Post-outlier GF modeling: We fit GF to each of the outlier SNP
data sets (Bayenv, LFMM, GF-Raw, and GF-X ) and to the set of SNPs
identified as outliers by both Bayenv and LFMM (Bayenv-LFMM). For each
SNP dataset, we fit GF using 500 regression trees per SNP and a variable
correlation threshold of 0.5 to invoke conditional importance estimates
(Ellis et al., 2012). We
used default values for the number of predictor variables randomly
sampled as candidates at each split and for the proportion of samples
used for training and testing each tree. For comparison, we also fit GF
to 999 sets of 500 SNPs selected at random from the full set of 107,309
SNPs, which we combined into a single model using the
‘combinedGradientForest’ function. All models were fit using the
‘gradientForest ’ library
(Ellis et al., 2012) in R
(R Core Team, 2018).
GF genetic offsets: We used each of the GF models and the
‘predict.GradientForest’ function to transform environmental conditions
described by the seven predictor variables into common units of
compositional turnover (1) throughout the geographic range of balsam
poplar and for (2) each population and (3) the two common gardens. To
quantify genetic offsets resulting from transplanting each population
from its home environment to the common gardens, we calculated the
Euclidean distance between each population and the common gardens in the
resulting multidimensional transformed environmental space from GF. For
comparison, we also calculated the Mahalanobis distance
(Mahalanobis, 1936) between
each population and the common gardens using the raw (untransformed)
environmental predictors, which serves as a ‘naive’ climate transfer
distance uninformed by genomic patterns.
To visualize and compare genetic patterns in geographic and biological
space, we used Principal Components Analysis (PCA) to reduce the seven
transformed environmental variables into three factors. The PCA was
centered but not scale transformed to preserve differences in the
magnitude of genetic importance among the environmental variables.
Variation in genetic composition was visualized as (1) a bi-plot of the
first two principal components with labeled vectors indicating the
direction and magnitude of major environmental correlates and (2) by
mapping the patterns back to geographic space. Variation in genetic
composition was visualized by assigning the first three principal
components to an RGB color palette. The resulting color similarity
corresponds to the similarity of expected patterns of genetic
composition. For comparison, we repeated this process using the raw
(untransformed) environmental variables.
GF models fit to different sets of SNPs are expected to produce
different predicted patterns of genomic variation. To estimate and
visualize differences in expected geographic patterns for each of the
six sets of modeled SNPs, we used Procrustes superimposition on the PCA
ordinations, where the matrices were rotated to minimize the sum of
squares of the distances between the sites in genetic space
(Peres-Neto & Jackson,
2001). The Procrustes residuals, which in this case measure the
absolute distance between sites in genetic space and the rotated
ordination space, were mapped to visualize differences in the predicted
genetic composition patterns between all pairwise comparisons of the six
different GF models fit using the different sets of SNPs. For all
visualizations, we constrained predictions to within the geographic
range of balsam poplar as defined by
(Little, 1971).
Genetic offset predictions of growth in common gardens: To
experimentally test how well genetic offset predicts performance in the
field when populations experience a novel climate, we used genetic
offset values to predict population mean height growth increment in the
two common gardens. We first generated population BLUPs of height growth
separately for each garden (VT and IH) using linear mixed-effects models
of the form:
\begin{equation}
Y_{\text{ijk}}\ =\ \mu\ +B_{i}+\ P_{j}+{P(G)}_{\text{jk}}+\varepsilon_{\text{ijk}}\nonumber \\
\end{equation}where height growth (Y ) is modeled as a function of the overall
mean (\(\mu\)), the fixed effect of block or position within the garden
(\(B_{i}\)), the random effect of population (\(P_{j}\)), the random
effect of genotype nested within population (\({P(G)}_{\text{jk}}\)),
and a normally distributed residual error
(\(\varepsilon_{\text{ijk}}\)). The resulting population-level BLUPs
from each model were then merged across gardens and combined with their
corresponding garden-specific genetic offset predictions for use in a
second model in which height growth was predicted as a function of
genetic offset while controlling for overall differences between
gardens. To allow for potential non-linearity in the response, we
included a quadratic term, giving a final model of the form:
\begin{equation}
Y_{\text{ij}}\ =\ \mu\ +\ Go_{i}+\ \text{Go}_{i}^{2}+\text{Gd}_{j}+\varepsilon_{\text{ij}}\nonumber \\
\end{equation}where Go and Go 2 are the linear and
quadratic genetic offset terms, respectively, and Gd is the
effect of the jth garden (VT or IH).
We evaluated height growth for different estimates of genetic offset
derived from (a) a combined GF model of all outlier loci, (b) outliers
identified by GF-X , (c) the average of 999 random draws of 500
random SNPs from the genomic background, and (d) Mahalanobis distances
based on climate-only. When not significant, we dropped the quadratic
term in favor of the simple linear model. We report the percentage of
variance explained (R 2) as a metric of model
performance.