INTRODUCTION
Climate change is expected to become a major threat to biodiversity this century (Sala et al., 2000; Urban, 2015), with cascading impacts on human well-being and ecosystem function (Pecl et al., 2017). Anticipating and mitigating these impacts requires actionable predictions of expected biological responses, which are expected to become increasingly difficult to anticipate under novel climates of the future (Fitzpatrick, Blois, et al., 2018; Urban et al., 2016). The adaptive capacity of species represents an important component of climate change vulnerability (Dawson, Jackson, House, Prentice, & Mace, 2011), yet few studies incorporate local adaptation into forecasting models, while even fewer have attempted to compare genomic predictions to actual organismal responses.
Recent technological advancements now provide access to massive quantities of data pertinent to biodiversity science and conservation (e.g., species occurrence databases, genome-scale DNA sequencing, high-resolution projections of future climate; Wüest et al., 2020). At the same time, new sophisticated machine learning methods have emerged that can take advantage of these data to identify conservation risks and opportunities under a changing climate. In particular, the application of machine learning to genomic studies of local adaptation represents an especially promising frontier for improving our understanding of biotic responses to climate change and the potential to consider climate vulnerability at the population level (Fitzpatrick & Keller, 2015; Gougherty, Keller, Chhatre, & Fitzpatrick, 2020; Savolainen, Lascoux, & Merilä, 2013).
Fitzpatrick & Keller (2015) described how a machine learning method known as Gradient Forests (GF; Ellis, Smith, & Pitcher, 2012) can be used to (1) analyze and map spatial variation in allele frequencies as a function of environmental gradients and (2) project patterns of genomic variation under future climate. GF derives monotonic, nonlinear functions that characterize compositional turnover in allele frequencies along each fitted environmental gradient. In addition to identifying the primary environmental drivers associated with genomic variation, these turnover functions provide unique insights into the nature of how genomic patterns vary along multiple environmental gradients, including where changes in allele frequencies are rapid or slow across space. The turnover functions from GF also can be used to transform (or rescale) the fitted environmental predictors from their arbitrary anthropogenic measurement units (e.g., ℃ of temperature or mm of precipitation) to common biological units of compositional turnover (Ellis et al., 2012). By transforming each of the predictor variables using its associated turnover function, the multidimensional environmental space can be converted into a multidimensional genomic space that characterizes differences in the expected genetic makeup between populations in different environments. By applying the turnover functions to scenarios of environmental change, one can project expected genomic patterns under future climate. The Euclidean distance between the locations of each population in the current and future genomic spaces characterizes the magnitude of expected change in genetic composition for each population given the pattern of climate change in each location. Fitzpatrick & Keller (2015) termed this distance the “genetic offset”, which can be viewed as a metric of the degree of expected maladaptation when a population is exposed to rapid climate change, assuming no adaptive evolution in situ or migration to allow adaptive alleles to track climate change. Gougherty et al. (2020) recently extended the genetic offset concept to consider the contributions of climate maladaptation, migration, and the potential for future novel gene-climate associations to the vulnerability of climatically adapted populations.
Since the publication of Fitzpatrick & Keller (2015), a growing number of studies have used genetic offsets to estimate climate maladaptation in a variety of species, including trees (Gugger, Liang, Sork, Hodgskiss, & Wright, 2018; Ingvarsson & Bernhardsson, 2020; Jia et al., 2020; Martins et al., 2018), birds (Bay et al., 2018; Ruegg et al., 2018), and agricultural crops such as maize landraces in Mexico (Aguirre-Liguori, Ramírez-Barahona, Tiffin, & Eguiarte, 2019). However, like projections of species-level responses to climate change from species distribution models, genetic offsets are in essence derived from a correlative, space-for-time substitution approach (Blois, Williams, Fitzpatrick, Jackson, & Ferrier, 2013) that ignores the enormous complexities underlying actual evolutionary responses of populations to environmental change, including interactions between selection, effective population size, and evolutionary processes shaping adaptive variation (e.g. migration, mutation, recombination). Instead, the use of genetic offsets assumes that, after correcting for neutral population structure, correlations between allele frequencies and environmental gradients reflect current patterns of local selection and relative fitness and that these existing gene-environment associationsacross space can be used to project the magnitude of change in allele frequencies expected through time to maintain gene-environment associations at their current status quo. Very few studies have tried to relate local adaptation analyses and associated predictions to actual organismal responses. As such, genetic offsets lack empirical validation, and it remains unknown what if any utility the concept has for predicting the actual performance of populations in novel environments.
Here we use machine learning, population genomic data, and common garden experiments to provide an empirical space-for-time test of the extent to which genetic offsets predict performance of populations in new environments. We measured growth performance of trees collected from climatically diverse populations which were clonally propagated in two common gardens. For these same populations, we also obtained genome-wide single nucleotide polymorphisms (SNPs) which were used in a series of genome scans for local adaptation employing multiple methods to determine outlier loci associated with climate. We then fit GF to the different sets of candidate SNPs determined using the different outlier detection methods and used these models to (1) identify the primary environmental variables driving the signals of local climate adaptation in the genome, (2) fit flexible functions describing how genetic patterns vary along the gradients, and (3) predict genetic offsets associated with transplanting individuals from their home climatic environment to the climates they experienced at the common garden sites. Specifically, we aim to address the following questions:
  1. How do GF models fit to different sets of statistical outlier SNPs differ in terms of variable importance, turnover functions, and predicted spatial patterns?
  2. How well do genetic offsets predict responses of populations transplanted to new common garden environments and do genetic offsets outperform naive ‘climate-only’ transfer distances?
  3. How sensitive is the predictive ability of genetic offsets to the composition of SNP panels derived from different outlier detection methods, or when randomly sampled from the genomic background?