MATERIALS AND METHODS
Study species: Balsam poplar (Populus balsamifera L., Salicaceae) is a common deciduous tree in northern temperate and boreal forest ecosystems across North America. Its expansive geographic range encompasses broad climatic gradients in growing season length, and populations exhibit abundant clinal variation and adaptive population divergence in allele frequencies and phenotypic traits, particularly those related to phenology and ecophysiology (Fitzpatrick & Keller, 2015; Keller, Levsen, Ingvarsson, Olson, & Tiffin, 2011; Keller, Levsen, Olson, & Tiffin, 2012; Keller, Soolanayakanahally, et al., 2011; Olson et al., 2013; Soolanayakanahally, Guy, Silim, Drewes, & Schroeder, 2009; Soolanayakanahally, Guy, Silim, & Song, 2013).
Based on previous work, balsam poplar is known to exhibit relatively strong regional population structure, with multiple studies recognizing a clearly divergent subpopulation in the eastern part of its range in Atlantic Canada and the New England States (Chhatre et al., 2019; Keller, Olson, Silim, Schroeder, & Tiffin, 2010; Meirmans, Godbout, Lamothe, Thompson, & Isabel, 2017). The rest of the range consists of a large, relatively homogenous genetic subpopulation in the middle core of its range that is weakly differentiated (F ST = 0.008) but with signatures of isolation-by-distance along an axis of longitude (Chhatre et al., 2019; Meirmans et al., 2017). Lastly, some studies suggest a third major subpopulation in northwestern Canada and Alaska (Keller et al., 2010).
In the current study, we focused our sampling and analysis on the widespread but weakly structured “Core” region in the middle of the range (Chhatre et al., 2019). We did this to minimize the known confounding introduced by regional population structure to genome scans of selection (Lotterhos & Whitlock, 2014), and because the Core region still captures large variation in the climatic environment, making it ideal for studying adaptive contributions of genomic variation while reducing the likelihood of false positives due to demographic history (Chhatre et al., 2019). Our selection of individuals and populations was based on retaining individuals with ADMIXTURE ancestry scores >0.9 in the Core region based on Chhatre et al. (2019). This resulted in 336 individuals from 42 populations for analysis of genomic variation at 107,309 biallelic SNPs, with populations distributed across north central US and Canada, from Vermont, New York, and Quebec in the east to Manitoba and Saskatchewan in the west, and south as far as WY (Table S1).
Common gardens: We measured performance of poplar clones in two common garden locations with contrasting climates, “IH” near Indian Head, Saskatchewan, Canada (50.52 °N, −103.68 °W), and “VT” near Burlington, Vermont, USA (44.44 °N, −73.19 °W). Establishment of both common gardens was from clonally propagated stem cuttings (6-9 cm with at least 2 vegetative buds) taken from natural populations while dormant during winter, rooted in potting media in the greenhouse and grown for 1 year, and then outplanted directly into the ground at the respective garden site. Details of the IH site are provided in Soolanayakanahally et al. (2013), which describes the original garden established in 2005 consisting of 15 genotypes from each population planted in aggregate with 2×2 meter spacing. Each population group was then clonally replicated 5 times in random locations throughout the garden. In 2011, a second planting at IH was established adjacent to the original garden, consisting of additional populations planted with the same specifications as the original planting. At the VT site, planting consisted of a completely randomized design with a target replication of 3 ramets per genotype planted with 2×2 meter spacing. At both sites, plants were not fertilized, but were watered as needed during the year they were established and then received no supplemental water thereafter. Additional details of the VT site can be found in Fetter, Nelson, & Keller (2019).
After losses due to mortality during establishment, plants available for phenotyping totaled 357 unique genotypes (N=117 in IH; 337 in VT) representing 41 of the 42 target populations (N=9 in IH; 41 in VT; population ‘NIC’ was absent from both; Table S1). Most of these genotypes with phenotype data were the same individuals sequenced with GBS for the 107,309 SNPs (297 of 336 individuals with GBS; representing 41 of the 42 populations with GBS). As an overall metric of growth performance, we measured the yearly height increment (cm) gain between the apical bud and the previous year’s bud scar on the most dominant stem. Height increment was measured at the end of the 2015 growing season for both sites, after all plants had finished setting bud. This gives an overall measure of growth achieved during the 2015 growing season that integrates effects of genotypic variability in phenology and relative growth rate (Soolanayakanahally et al., 2013). Genotypes with height increment data were represented by a mean (SD) of 2.6 (1.5) ramets.
Environmental data for field locations : To characterize environmental conditions at the sampling locations of the source populations, we used a set of seven variables at 30-arcsecond resolution (~1 km x 1 km). We selected this set of seven variables from a larger (n =21) initial set composed of 19 bioclimatic variables and elevation from WorldClim v1.4 (representing averages for the period 1960-1990; www.worldclim.org; Hijmans, Cameron, Parra, Jones, & Jarvis, 2005), plus latitude. We used an iterative process of GF model fitting to assess variable importance combined with variance inflation factor (VIF) analysis to assess collinearity, with the goal of retaining a final set of the most important and interpretable and least collinear variables to the greatest extent possible. VIF estimates the potential impact of multicollinearity in regression-type models by quantifying the extent to which standard errors are inflated due to collinearity compared to when uncorrelated variables are used. To calculate VIFs, we used the ‘vifcor’ function with a correlation threshold of 0.75 in the ‘usdm ’ package (Naimi, Hamm, Groen, Skidmore, & Toxopeus, 2014) in R (R Core Team, 2018). The final set of seven variables included latitude (y), elevation (elev) and five bioclimatic variables: mean diurnal range (bio2), mean summer (bio10) and winter (bio11) temperature, and summer (bio18) and winter (bio19) precipitation. Latitude was included to characterize the gradient in daylength not otherwise captured by the bioclimatic variables. Of our seven retained variables, mean winter temperature (bio11) and latitude were found to have a VIF greater than 10 (10.31 and 17.00 respectively), which is considered an approximate rule-of-thumb cutoff VIF (Guisan, Thuiller, & Zimmermann, 2017). However, given that GF has some ability to accommodate correlated variables (Ellis et al., 2012) and daylength is an important determinant of phenology and height growth cessation in poplars (Soolanayakanahally et al. 2013), we opted to retain both of these variables in our models. We aggregated the individual-level environmental data to the level of populations (42 total) by taking the mean across individuals within each population (mean = 8 individuals per population, min=2, max=13).
Environmental data for common garden locations : To characterize climatic conditions at the two common garden experiments, we used DayMet (Thornton et al., 2014) data for 2014 and 2015, which spanned the relevant period from planting and establishment of the clonal replicates to when the growth measurements were collected. The Daymet dataset provides gridded estimates of temperature and precipitation for North America on a daily time step at ~1 km x 1 km spatial resolution. We aggregated the daily data for 2014 and 2015 to produce monthly averages for maximum and minimum temperature and monthly sums for precipitation. We used these monthly summaries and the ‘biovars’ function in the ‘dismo ’ package (Chamberlain, 2017) to calculate the same set of bioclimatic predictor variables used to characterize climate for the field collections.
Univariate outlier detection methods: Full details of the GEA selection scans are described in Chhatre et al. (2019), and recapped again briefly here. Namely, we used two complementary approaches: (1) the population-based method bayenv2(Günther & Coop, 2013) that corrects for population relatedness using a variance-covariance matrix of allele frequencies before testing for a correlation between SNP allele frequencies and an environmental predictor; and (2) the individual-based method lfmm(Frichot, Schoville, Bouchard, & François, 2013) that controls for background genetic structure by introducing latent factors into a linear mixed models while testing for associations between the genotypic state at each SNP and an environmental predictor. Both bayenv2 and lfmm are inherently univariate, testing one environmental predictor’s association with one SNP at a time. Therefore, to avoid inflating type I error rates, we used as environmental predictors in each approach the first 2 PC’s from a principal components analysis on 19 bioclim variables (Hijmans et al., 2005) plus source latitude.
We generated empirical P -values for assessing significance in GEA tests, as advocated by Lotterhos & Whitlock (2014), using a subset (n =1,353) of non-coding intergenic SNPs based on the Populus trichocarpa v3.0 genome annotation. We used these intergenic SNPs to generate empirical null distributions of the test statistics used in our selection scans (Bayes Factors forbayenv2 and 𝝀-adjusted z-scores for lfmm ), based on ranking each test SNP within the empirical null distribution and determining its quantile sensuLotterhos & Whitlock (2014). Candidate SNPs with a test statistic equal to or exceeding that of the empirical null distribution were designated as selection outliers with a P ≤ 0.000739 (=1/[1353+1]).
GF background: Ellis et al. (2012) provide details regarding the Gradient Forest algorithm and Fitzpatrick & Keller (2015) describe the application of GF to modeling genomic data and predicting genetic offsets. In brief, GF is a flexible, non-parametric extension of the machine learning approach known as Random Forests (Breiman, 2001). GF uses Random Forest to fit an ensemble of regression trees to model change in allele frequencies across sites and derive monotonic, nonlinear functions of environmental predictors. GF stands on the shoulders of Random Forests and inherits its robust statistical measures of model performance and variable importance. The predictive performance of the GF model for each SNP is quantified using the proportion of out‐of‐bag data variance explained (R 2), which is a robust estimate of generalization error. These robust goodness-of-fitR 2 values allow ranking of SNPs by how well the environmental gradients explain changes in SNP allele frequencies and can inform the detection of statistical outliers by selecting SNPs withR 2 values that exceed an empirical threshold (see below). The accuracy importance of predictors is quantified as the decrease in performance when each predictor is randomly permuted. For correlated variables, a conditional approach can be used (see Ellis et al., 2012). Lastly, from the ensemble of regression models, GF determines how well partitions distributed at numerous “split values” along each environmental variable explain changes in allele frequencies on either side of a split. The amount of variation explained by each split is the ‘raw split importance’. The empirical, nonlinear turnover functions are constructed by distributing the R 2 values from all SNPs among the predictor gradients in proportion to their accuracy importance and along each gradient according to the density of the raw split importance values. The split importance values for all modeled SNPs also are aggregated to an overall, genome-wide turnover function for each variable using weightings based on predictor importance and the goodness-of-fit for each SNP model. Portions of gradients where split importance is high emerge as thresholds where genetic change is rapid (as might be expected between genetic groups). Gradients strongly associated with genetic variation will have more important splits therefore greater overall cumulative importance than gradients with little biological relevance. Important gradients will also have greater contribution to predicted genetic offsets.
GF outlier detection – Simulations: In previous work (Fitzpatrick & Keller, 2015; Fitzpatrick, Keller, & Lotterhos, 2018), we have advocated the use of GF to calculate genetic offsets on sets of outlier loci pre-ascertained using established methods for outlier detection that have been shown to have low false positive rates for the presumed demographic history of the sample. Here, we were interested in assessing the behavior of GF for outlier detection, given its strengths of incorporating multivariate predictors, interactions between predictors, and non-linear allele frequency gradients, but also its weakness of not controlling for demographic history. Rather than producing a comprehensive test of GF on an array of different demographic histories, we instead focused on quantifying statistical power and the rate of false positives under a demographic model of isolation-by-distance. We chose this based on our sample of Core region populations that show minimal genetic structure (F ST = 0.008) but weak isolation by distance along an east-west gradient (Chhatre et al., 2019; Meirmans et al., 2017). We simulated population genetic data using CDPOP v1.2.20 (Landguth & Cushman, 2010) under a simple linear stepping stone model of 30 equal sized demes arrayed along a 1-dimensional spatial gradient, with each deme consisting of 20 unisexual individuals of equal sex ratio (total size = 600 inds). Each diploid individual consisted of 1,000 unlinked biallelic loci. To generate starting allele frequencies, we used the coalescent to simulate a site frequency spectrum (SFS) under a neutral Wright-Fisher model in ms(Hudson, 2002), removed loci with minor allele frequencies <0.1, and then randomly drew 1,000 loci from the SFS to seed the initial frequencies for each deme. Gene flow was simulated with a migration function that allowed adults to disperse to nearby demes prior to mating each generation. Probability of migration was a negative linear function of distance from the source deme, with probability declining to zero at a number of demes (u ) distant from the source. To evaluate different levels of migration, we chose u values of 2, 4 or 8 demes away, which resulted in meanF ST levels of 0.15, 0.04, and 0.008, respectively. After migration, mating occurred randomly among members within a deme, and females produced a Poisson distributed number of offspring with a mean and variance (𝜆 value) = 5. Offspring genotypes were assembled based on Mendelian inheritance at each locus and linkage equilibrium between loci, with a probability of de-novo mutation = 1e-8. For each migration scenario, we designated locus 1 as experiencing one of three strengths of selection (weak: s = 0.01; moderate: s = 0.1; or strong: s = 0.2). Linear viability selection was applied whereby the probability of mortality decreased linearly from s to 0 for the dominant homozygote and heterozygote and increased linearly from 0 to s for the recessive homozygote across the 1D landscape. Under the non-linear scenario, the fitness of genotypes decayed or increased by the same magnitude, but followed an exponential function defined by (a) slope of mortality decay and (b) they -intercept (Supplementary Figure S1). We also included a neutral scenario for each migration level in which locus 1 experienced no selection (s = 0). Thus, there were a total of 12 simulated scenarios (3 migration rates x 4 selection strengths), which we replicated independently 100 times per scenario. For each simulation replicate, we obtained allele frequencies per deme from the 100th simulated generation and used these in GF to identify outlier status using the same settings applied to empirical data (see below). For scenarios where s > 0, we calculated power as the proportion out of 100 simulation replicates where locus 1 was identified as an outlier in the distribution ofR 2 values from GF for a given significance level (range of 𝛼: 0.001 - 0.1). For the neutral scenario (s = 0), we calculated the false positive rate (𝛼 = 0.05) as the proportion out of 100 simulation replicates where locus 1 was in the 95% quantile of R 2 values from GF.
GF outlier detection – Empirical: In addition tobayenv2 and lfmm , we also used GF to select outlier SNPs using two types of allele frequency estimates: (1) “raw” or uncorrected allele frequencies (hereafter termed “GF-Raw”), and (2) standardized allele frequencies (X ) output from bayenv2(Günther & Coop, 2013) corrected using the estimated population genetic (co)variance matrix (hereafter termed “GF-X ”). GF-Raw used direct estimates of population minor allele frequencies based on our sample of N individuals per population (mean N = 8; range: 2 - 13). The GF-X approach was included to explore a more robust approach to allele frequency estimation that corrects for finite sampling and population relatedness prior to feeding into GF for outlier detection. To select GF-Raw outliers, we fit GF to the raw minor allele frequencies for the 107,309 SNPs and obtained an R 2 for each locus. We compared the distribution of the resulting R 2values to the empirical P -value derived from the 1,353 intergenic SNPs. To select GF-X outliers, we first obtained standardized allele frequencies (X ) estimated using the fitted omega matrix inbayenv2 as described above (cf . Univariate outlier detection methods) and using the ‘-f’ flag to output 190 MCMC draws ofX for each candidate SNP. Following advice given in thebayenv2 manual, we then fit GF models separately to each of the 190 MCMC draws and extracted the resulting 190R 2 values for each of the 107,309 SNPs. From these values, we calculated the median R 2 for each candidate SNP, and determined empirical P- values by determining their rank within the distribution of the medianR 2 values from the 1,353 intergenic SNPs.
Post-outlier GF modeling: We fit GF to each of the outlier SNP data sets (Bayenv, LFMM, GF-Raw, and GF-X ) and to the set of SNPs identified as outliers by both Bayenv and LFMM (Bayenv-LFMM). For each SNP dataset, we fit GF using 500 regression trees per SNP and a variable correlation threshold of 0.5 to invoke conditional importance estimates (Ellis et al., 2012). We used default values for the number of predictor variables randomly sampled as candidates at each split and for the proportion of samples used for training and testing each tree. For comparison, we also fit GF to 999 sets of 500 SNPs selected at random from the full set of 107,309 SNPs, which we combined into a single model using the ‘combinedGradientForest’ function. All models were fit using the ‘gradientForest ’ library (Ellis et al., 2012) in R (R Core Team, 2018).
GF genetic offsets: We used each of the GF models and the ‘predict.GradientForest’ function to transform environmental conditions described by the seven predictor variables into common units of compositional turnover (1) throughout the geographic range of balsam poplar and for (2) each population and (3) the two common gardens. To quantify genetic offsets resulting from transplanting each population from its home environment to the common gardens, we calculated the Euclidean distance between each population and the common gardens in the resulting multidimensional transformed environmental space from GF. For comparison, we also calculated the Mahalanobis distance (Mahalanobis, 1936) between each population and the common gardens using the raw (untransformed) environmental predictors, which serves as a ‘naive’ climate transfer distance uninformed by genomic patterns.
To visualize and compare genetic patterns in geographic and biological space, we used Principal Components Analysis (PCA) to reduce the seven transformed environmental variables into three factors. The PCA was centered but not scale transformed to preserve differences in the magnitude of genetic importance among the environmental variables. Variation in genetic composition was visualized as (1) a bi-plot of the first two principal components with labeled vectors indicating the direction and magnitude of major environmental correlates and (2) by mapping the patterns back to geographic space. Variation in genetic composition was visualized by assigning the first three principal components to an RGB color palette. The resulting color similarity corresponds to the similarity of expected patterns of genetic composition. For comparison, we repeated this process using the raw (untransformed) environmental variables.
GF models fit to different sets of SNPs are expected to produce different predicted patterns of genomic variation. To estimate and visualize differences in expected geographic patterns for each of the six sets of modeled SNPs, we used Procrustes superimposition on the PCA ordinations, where the matrices were rotated to minimize the sum of squares of the distances between the sites in genetic space (Peres-Neto & Jackson, 2001). The Procrustes residuals, which in this case measure the absolute distance between sites in genetic space and the rotated ordination space, were mapped to visualize differences in the predicted genetic composition patterns between all pairwise comparisons of the six different GF models fit using the different sets of SNPs. For all visualizations, we constrained predictions to within the geographic range of balsam poplar as defined by (Little, 1971).
Genetic offset predictions of growth in common gardens: To experimentally test how well genetic offset predicts performance in the field when populations experience a novel climate, we used genetic offset values to predict population mean height growth increment in the two common gardens. We first generated population BLUPs of height growth separately for each garden (VT and IH) using linear mixed-effects models of the form:
\begin{equation} Y_{\text{ijk}}\ =\ \mu\ +B_{i}+\ P_{j}+{P(G)}_{\text{jk}}+\varepsilon_{\text{ijk}}\nonumber \\ \end{equation}
where height growth (Y ) is modeled as a function of the overall mean (\(\mu\)), the fixed effect of block or position within the garden (\(B_{i}\)), the random effect of population (\(P_{j}\)), the random effect of genotype nested within population (\({P(G)}_{\text{jk}}\)), and a normally distributed residual error (\(\varepsilon_{\text{ijk}}\)). The resulting population-level BLUPs from each model were then merged across gardens and combined with their corresponding garden-specific genetic offset predictions for use in a second model in which height growth was predicted as a function of genetic offset while controlling for overall differences between gardens. To allow for potential non-linearity in the response, we included a quadratic term, giving a final model of the form:
\begin{equation} Y_{\text{ij}}\ =\ \mu\ +\ Go_{i}+\ \text{Go}_{i}^{2}+\text{Gd}_{j}+\varepsilon_{\text{ij}}\nonumber \\ \end{equation}
where Go and Go 2 are the linear and quadratic genetic offset terms, respectively, and Gd is the effect of the jth garden (VT or IH).
We evaluated height growth for different estimates of genetic offset derived from (a) a combined GF model of all outlier loci, (b) outliers identified by GF-X , (c) the average of 999 random draws of 500 random SNPs from the genomic background, and (d) Mahalanobis distances based on climate-only. When not significant, we dropped the quadratic term in favor of the simple linear model. We report the percentage of variance explained (R 2) as a metric of model performance.