Box 3: Maximising the advantage of whole genome sequencing with
haplotype data
All sequencing technologies allow allele frequencies to be measured. One
of the key advantages of whole-genome resequencing over other
technologies is the opportunity to exploit additional information, such
as the haplotypes on which physically linked alleles are coinherited.
Haplotype data enable the use of several powerful analytical methods
(reviewed by Leitwein, Duranton, Rougemont, Gagnaire, & Bernatchez,
2020) that are relevant to invasion genomics.
Because recombination and mutation reconfigure haplotypes over time, the
size and frequency of haplotypes convey evolutionary information – a
phenomenon that Moorjani et al. (2016) refer to as the
‘recombination clock’. For example, a haplotype on which a beneficial
allele arises is swept to fixation faster than recombination can break
it down to its expected size under neutrality. Therefore a signature of
selection is left by unusually large stretches of haplotype homozygosity
(i.e. , linkage extends further from the selected locus than
expected), and by the unexpectedly high frequency of a core haplotype
(Sabeti et al., 2002). This is the basis for tests of extended haplotype
homozygosity, used to scan the genome for signatures of selection (see
Parts 1 and 4). Haplotype data are also useful for reconstructing
population size change through time (Part 3). By analysing long
haplotypes identical by descent (that have not yet been broken down by
recombination), Browning and Browning (2015) were able to accurately
reconstruct changes in human population size in the recent past (4 to 50
generations before present). This approach holds great potential for
invasion genetics, where it is often difficult to reconstruct recent
demography (see Part 3.1).
Haplotype data show most promise in recently admixed populations (see
Part 5). Any analysis of hybridization using haplotype data will require
the ancestry of an introgressed haplotype (‘ancestry tract’ or ‘ancestry
block’) to be inferred (for a review of approaches to ancestry
assignment see Leitwein et al., 2020). Duranton et al. (2019)
studied the introgression of Atlantic sea bass (Dicentrarchus
labrax ) into Mediterranean populations of the same species. By
modelling the diffusion of introgressed haplotypes through space (by
gene flow) as they are broken down over time (by recombination), the
average per-generation dispersal distance could then be estimated. This
approach is likely to be useful for reconstructing the spatial extent of
introgression in invasive species (See Parts 2 and 5). Finally, adaptive
introgression can be accurately detected using haplotype data (see
Shchur, Svedberg, Medina, Corbett-Detig, & Nielsen, 2020). In summary,
haplotype data open many possibilities in invasion genetics research,
representing one of the key advantages of using WGR to study invasive
species.
However, haplotype information cannot be directly extracted from WGR
data generated using short reads. Therefore, until long-read sequencing
becomes scalable, direct or indirect methods for inferring gametic phase
(i.e. , the two DNA sequences on which alleles occur, in the case
of diploids) need to be used to leverage haplotype information from WGR
data.
Indirect or statistical phasing methods can be applied to whole-genome
datasets obtained with short-read sequencing technology (reviewed by
Rhee et al., 2016). The accuracy of these methods depend on factors such
as the number of samples and the density of nucleotide polymorphisms
(Browning & Browning, 2007). Phasing errors can affect the downstream
biological interpretations made by analysing haplotypes. Direct phasing
methods, on the other hand, record chromosomal haplotypes during the
generation of sequence data. Linked-read sequencing is a newly developed
family of direct phasing technologies that results in fewer errors than
indirect statistical approaches (Amini et al., 2014; Choi, Chan,
Kirkness, Telenti, & Schork, 2018).
Though linked-read sequencing approaches show great promise in
population genomics (e.g ., Lutgen et al., 2020), many platforms
are currently prohibitively expensive. One notable exception is
haplotagging, a recent low-cost linked-read sequencing method (Meier et
al., 2020). Through haplotagging, kilobase-length DNA fragments are
tagged with unique barcodes as they wrap around unique microbeads in
solution.