4.3 Scans for the genomic signature of selection
WGR has more commonly contributed to reverse-genetic approaches, where whole-genome scans are used to identify loci that have been subject to selection without directly knowing the traits involved. In this way, inferences can be made about the genetic basis and evolutionary history of adaptation even when the ecology and life history of an invasive species is poorly understood. There are various ways to identify the signature of natural selection from genomic datasets. When studying a single invasive population, the footprint of a selective sweep can be identified from the site frequency spectrum of genetic variation (DeGiorgio, Huber, Hubisz, Hellmann, & Nielsen, 2016; Nielsen et al., 2005). Alternatively, comparisons between populations (e.g. , between different timepoints during invasion or between native and invasive populations) can be used to identify regions of high divergence, using summary statistics such as FSTor the population branch statistic (Yi et al., 2010). Another rarely exploited approach to measuring adaptation in invasive species is the use of sequence data collected in a time series – analogous to ‘evolve-and-resequence’ experiments carried out in laboratory populations (Long, Liti, Luptak, & Tenaillon, 2015; Schlötterer, Kofler, Versace, Tobler, & Franssen, 2015). Not only can this approach be used to rule out pre-invasion adaptation (see Part 1.1), it also provides a powerful framework in which to identify allele frequency shifts resulting from simple or polygenic adaptation (Buffalo & Coop, 2020; Otte & Schlötterer, 2020). Where samples are not readily available from early timepoints in the invasion, historical museum or herbarium samples can be used to infer past allele frequencies (Bi et al., 2019; McGaughran, 2020). All the approaches mentioned above can be used with SNPs, transposable elements or structural variants, which are readily detectable using WGR data and difficult to measure using other sequencing technologies (Bertolotti et al., 2020).
There are already a handful of examples of selection scans being used in invasion biology. For example, a genome-wide scan for association with invasiveness in 16 invasive and 6 native populations of Drosophila suzukii identified SNPs in two genes associated with independent invasion routes (Olazcuaga et al., 2020). Using a similar approach that controlled for population structure, genome scans across the global distribution of P. xylostella identified three potentially novel insecticide resistance alleles (You et al., 2020), and signatures of positive selection were associated with sugar receptor genes in inHyphantria cunea (mulberry moth) (Wu et al., 2019). Other studies have made use of WGR data by identifying structural variants and transposable elements, investigating their effect on fitness in invasive populations. For example, again in Drosophila suzukii , fifteen putative adaptive transposable elements were identified, one of which was 399bp from a SNP previously associated with invasion success in this species (Mérel et al., 2020). In this way, WGR can identify an otherwise invisible dimension of genetic variation.
It is has long been realised that genome scans for selection need to account for background genomic processes that can lead to false positives for adaptive loci. These can include genetic drift caused by demographic changes and selective processes such as background selection. In some cases, the peculiar biology of invasive species makes them especially prone to such problems, as genetic bottlenecks can lead to signatures of reduced variation similar to those caused by selection (see Box 2). Furthermore, any summary statistic capturing the coalescent process will be influenced by variation in recombination rate (c ) and changes in the effective population size (Ne ) (Barton & Etheridge, 2004; Booker, Yeaman, & Whitlock, 2020; Brandvain & Wright, 2016). Ne and c can be estimated empirically with WGR data. Changes inNe can be inferred using demographic inference methods (see Part 3.1), while recombination rate variation along the genome can be estimated by constructing a linkage map or with phased WGR data (Chan, Jenkins, & Song, 2012). User-friendly modelling tools, such as SLiM , can be used to explore the expected distribution of summary statistics under various combinations ofNe and c (Haller, Galloway, Kelleher, Messer, & Ralph, 2019; Haller & Messer, 2019). Tests for selection that explicitly incorporate demography and recombination can also be used (e.g. , Luqman, Widmer, Fior, & Wegmann, 2020). Therefore, despite the confounding effects of recombination rate variation and demographic history on the summary statistics used in genome scans, it is now easier than ever to identify and account for these effects.