Title: Extensive introgression despite Haldane’s rule: Insights from grasshopper hybrid zones
Running title: introgression despite Haldane’s rule
Authors: Linda Hagberg 1, Enrique Celemín1,2, Iker Irisarri 3,4, Oliver Hawlitschek 5,6, José L Bella 7,8, Tamí Mott 9, Ricardo J Pereira 1*
1- Division of Evolutionary Biology, Faculty of Biology II, Ludwig-Maximilians-Universität München, Grosshaderner Strasse 2, 82152 Planegg-Martinsried, Germany.
2- Unit of Evolutionary Biology/Systematic Zoology, Institute of Biochemistry and Biology, Universität Potsdam, Karl-Liebknecht-Strasse 24-25, 14476 Potsdam, Germany.
3- University of Goettingen, Institute for Microbiology and Genetics, Department of Applied Bioinformatics, Goldschmidtstr. 1, 37077 Göttingen, Germany.
4- Campus Institute Data Science (CIDAS), Göttingen, Germany.
5- Leibniz Institute for the Analysis of Biodiversity Change (LIB), Zoological Museum, Martin-Luther-King-Platz 3, 20146 Hamburg, Germany.
6- Zoologische Staatssammlung (SNSB-ZSM), Münchhausenstr. 21, 81247 Munich, Germany.
7- Departamento de Biología (Genética), Facultad de Ciencias, Universidad Autónoma de Madrid, 28049 Madrid, Spain.
8- Centro de Investigación en Biodiversidad y Cambio Global (CIBC-UAM), Universidad Autónoma de Madrid, 28049 Madrid, Spain.
9- Instituto de Ciências Biológicas e da Saúde, Universidade Federal de Alagoas, 57072-900, Maceió, Alagoas, Brazil.
* Corresponding author: Ricardo J Pereira; email:ricardojn.pereira@gmail.com

ABSTRACT

Although the process of species formation is notoriously idiosyncratic, the observation of pervasive patterns of reproductive isolation across species pairs suggests that generalities, or “rules”, underlie species formation in all animals. Haldane’s rule states that whenever a sex is absent, rare or sterile in a cross between two taxa, that sex is usually the heterogametic sex. Yet, understanding how Haldane’s rule first evolves and whether it is associated to genome wide barriers to gene flow remains a challenging task because this rule is usually studied in highly divergent taxa that no longer hybridise in nature. Here, we address these questions using the meadow grasshopperPseudochorthippus parallelus where populations that readily hybridise in two natural hybrid zones show hybrid male sterility in laboratorial crosses. Using mitochondrial data, we infer that such populations have diverged some 100,000 years ago, surviving multiple glacial periods in isolated Pleistocenic refugia. Nuclear data shows that secondary contact has led to extensive introgression throughout the species range, including between populations showing hybrid male sterility. We find repeatable patterns of genomic differentiation across the two hybrid zones, yet such patterns are consistent with shared genomic constraints across taxa rather than their role in reproductive isolation. Together, our results suggest that Haldane’s rule can evolve relatively quickly within species, particularly when associated to strong demographic changes. At such early stages of species formation, hybrid male sterility still permits extensive gene flow, allowing future studies to identify genomic regions associated with reproductive barriers.
KEYWORDS: speciation, hybridisation, sterility,Chorthippus , Pseudochorthippus

1. INTRODUCTION

Species formation is largely dependent on specific selective regimes, spatial-temporal contingencies, and natural history of the organism (Gould, 1990). Yet, the observation of globally recurring patterns, or “rules”, of reproductive isolation across species pairs suggests that generalities underlie speciation across animals (Presgraves, 2010) and plants (Brothers & Delph, 2010). As described by the Bateson (1909), Dobzhansky (1936) and Muller (1942) model, mutations that are adaptive or nearly neutral in their own genomic background, can be functionally incompatible with alleles that are present in a foreign genomic background, causing hybrid sterility or inviability (Presgraves, 2010). One of such “rules of speciation” (Coyne & Orr, 1989) was termed “Haldane’s rule”, and states that whenever a sex is absent, rare or sterile in a cross between two taxa, that sex is usually the heterogametic sex. This empirical observation (Haldane, 1922) has been elucidated by laboratorial interspecific crosses showing that alleles involved in incompatibilities are generally recessive (Masly & Presgraves, 2007), so that their deleterious effects in hybrids are masked in the diploid chromosomes (i.e. autosomes and sexual chromosomes in the homogametic sex, XX females and ZZ males), but are exposed in the haploid sexual chromosomes in the heterogametic sex (i.e. XY and X0 males or ZW females). Although such a rule has now been reported across organisms that differ in sexual determination system (Coyne & Orr, 2004; Schilthuizen, Giesbers, & Beukeboom, 2011), understanding how Haldane’s rule evolves and how it restricts gene flow between emergent species remains an important task in evolutionary biology (Coyne, 2018; Payseur, Presgraves, & Filatov, 2018; Payseur & Rieseberg, 2016; Presgraves, 2018).
Most of the current knowledge on Haldane’s rule stems from experimental crosses between highly divergent species, namely drosophilids (Masly & Presgraves, 2007; Payseur et al., 2018; Presgraves, 2018; Presgraves & Meiklejohn, 2021) that speciated between 4,000,000 and 300,000 years ago (Kliman et al., 2000; Wang & Hey, 1996). With some exceptions (Ebdon et al., 2021; Matute, 2010; Teeter et al., 2007), these species rarely hybridise in nature and laboratorial hybrids are often lethal or inviable (Matute & Cooper, 2021). Thus, incompatibilities manifested in these systems may have arisen after reproductive isolation has been established (Coyne, 2018; Matute, 2010), questioning their role in limiting gene flow between incipient species and hence in facilitating species formation.
Hybrid zones provide a direct insight into the phenotypes and underlying genes that maintain genetic barriers between emerging species, and thus that presumably facilitate species formation (Barton & Hewitt, 1985; Harrison, 1993). Hybrid zones are broadly defined as areas in which incipient species meet, mate, and produce at least some offspring of mixed ancestry (Harrison, 1990), and can be observed between recently diverged taxa, such as incipient species, subspecies, or even differentiated populations. Unlike experimental hybridisation that is necessarily limited to few generations of recombination, hybrid zones result from thousands of generations of recombination, reducing physical linkage among loci involved in early incompatibilities and neutral loci, and allowing for high resolution to map genes underlying reproductive isolation (Rieseberg, Baird, & Gardner, 2000). Moreover, incompatibilities expressed in hybrid zones are usually associated to mildly maladaptive phenotypes such as partial sterility (Nachman & Payseur, 2012; Nürnberger, Lohse, Fijarczyk, Szymura, & Blaxter, 2016) that are thought to play a more important role at early stages of species formation, relative to full sterility and inviability. Hybrid zones exhibiting the same reproductive barriers are particularly useful as they allow testing for replicated patterns of genomic differentiation linked to barriers to gene flow or to shared genomic constrains (Poelstra et al., 2014; Ravinet et al., 2016; Westram, Faria, Johannesson, & Butlin, 2021), thus offering insights into predictability of evolution at the gene level (Christin, Weinreich, & Besnard, 2010; Stern & Orgogozo, 2009).
The meadow grasshopper Pseudochorthippus parallelus (Zetterstedt, 1821), formerly known as Chorthippus parallelus (Defaut, 2012), is a suitable system to shed light on how Haldane’s rule evolves and its contribution in limiting gene flow between emerging species. Like most species in Europe (Hewitt, 2000), this species survived in isolated southern refugia in the Iberian, Italic and Balkan peninsulas, as permanent ice covered northern latitudes during the Pleistocene ice ages (>15,000 years ago; Butlin, 1998; Hewitt, 1993; Ivy-Ochs et al., 2008). Geographic isolation resulted in the formation of two subspecies, P. p. erythropus in Iberia and P. p. parallelus in the remaining peninsulas, which differ in morphology, behaviour, cytogenetics, and in molecular markers (Butlin & Hewitt, 1985; Butlin, Ritchie, & Hewitt, 1991; Cooper, Ibrahim, & Hewitt, 1995; Gosálvez, López-Fernández, Bella, Butlin, & Hewitt, 1988; Serrano et al., 1996). After the retraction of the ice coverage the two subspecies expanded and established a hybrid zone in the Pyrenees, when the ice last disappeared some 8,000 years ago (Hewitt, 1993), which corresponds to 8,000 generations since secondary contact. About the same elevation and presumably at the same time in the Alps (Palacios, de Andrés, Gómez-Ortiz, & García-Ruiz, 2017), two evolutionary lineages assigned to the subspecies P. p. parallelus established a second hybrid zone (Cooper et al., 1995). Limiting molecular data based on one mitochondrial and one nuclear gene (Korkmaz, Lunt, Çıplak, Değerli, & Başıbüyük, 2014; Lunt, Ibrahim, & Hewitt, 1998) could not establish the phylogenetic relationships between these three lineages (Cooper et al., 1995). Although today there are no obvious signs of hybrid male sterility near the centres of the two hybrid zones, experimental crosses using parental populations show that F1 males (X0) have severe testis dysfunction, disrupted meiosis and are sterile, but F1 females (XX) are fertile and fully fit, conforming to Haldane’s rule in both hybrid zones (Flanagan, 1997; Hewitt, Butlin, & East, 1987). Backcrosses of F1 females show some recovery of male sterility, suggesting that fitness recovery is possible through recombination and selection against incompatibilities (Virdee & Hewitt, 1992, 1994). Other reproductive barriers, such as premating (Ritchie, Butlin, & Hewitt, 1989) and postmating-prezygotic barriers (Bella, Butlin, Ferris, & Hewitt, 1992), have also evolved between parental populations. Yet, non-coincident clines for traits associated to male signalling and female preference suggest a breakdown of behavioural isolation (Butlin & Ritchie, 1991). In contrast, clines for hybrid male sterility remain narrow and located at the centre of the hybrid zone (Virdee & Hewitt, 1994), consistent with an effective barrier to gene flow. Nothing is known regarding patterns of gene flow, besides limiting evidence from one allozymic marker (Butlin, 1998), two mitochondrial markers (Pereira et al., 2021) and from heterochromatin banding of the X chromosome (Serrano et al., 1996).
Although the P. parallelus hybrid zone was early recognised as “a window into the evolutionary process” (Harrison 1993) and “a natural laboratory for evolutionary studies” (Hewitt, 1993), its large (14.72 Gb) and highly repetitive genome has prevented genetic studies using traditional Sanger and whole genome sequencing methods. Using RNA sequencing methods, we use the P. parallelus system to provide insights into the tempo and mode of the evolution of Haldane’s rule, whether it may restrict genetic introgression, and whether it is associated to repeatable patterns of genomic differentiation.

2. MATERIALS AND METHODS

2.1 Sampling and sequencing

To understand the phylogeographic history of Pseudochorthippus parallelus , we sampled 10 localities covering the known evolutionary lineages within this species (Fig. 1A; Table S1). For the subspeciesP. p. erythropus , we have sampled a population near its putative refugium in the Iberian Central System, a second in Portugal, and a third neighbouring the Pyrenean hybrid zone. For the subspecies P. p. parallelus , we have sampled the lineage that putatively originated from a range expansion from a Balkan refugium (Lunt et al., 1998) in two populations within the central European range (Austria and Slovenia), plus two populations neighbouring the Pyrenean and Alpine hybrid zones. We have also sampled the parallelus that putatively originated from the Italic refugium, south of the Alpine hybrid zone. Additionally, we sampled populations near the putative centres of the Pyrenean and Alpine hybrid zones, following earlier cytogenetic studies (Flanagan, Mason, Gosálvez, & Hewitt, 1999; Zabal-Aguirre, Arroyo, & Bella, 2010). In every locality we sampled five male individuals in close proximity. In some cases, two nearby sub-localities were sampled to capture a representative amount of diversity (see Table S1; NCBI BioProject PRJNA——). All specimens were sampled for this study, with the exception of the reference parental population neighbouring the Pyrenean hybrid zone (PAR, ERY), which were published in a previous study (Nolen et al., 2020; NCBI BioProject PRJNA665162). Additionally, we used published data from five individual males ofChorthippus biguttulus sampled in Germany, used as an outgroup (Berdan, Mazzoni, Waurick, Roehr, & Mayer, 2015; NCBI BioProject PRJNA284873).
In the field, we preserved whole body tissue in RNAlater (Qiagen) to sample the largest variety of transcripts possible. We excluded the head and upper digestive tract to avoid gut contamination and overrepresentation of eye pigments. In the laboratory, we homogenised the samples using ceramic beads (1.4/2.8 mm, Precellys) and the standard Tri-Reagent protocol (Sigma). We resuspended and purified RNA pellets with RNAeasy Mini columns (Qiagen), followed by a quantity and integrity check using an Agilent 2100 BioAnalyser. The BGI group performed mRNA enrichment, library construction and paired-end Illumina HiSeq2500 sequencing. The three populations from the Pyrenees had a sequencing average of 59,391,201 paired reads (standard deviation: 6,908,681.337) of 150 bp, while the remaining seven populations had a sequencing average of 49,410,055 (standard deviation: 1,718,007.777) paired reads of 100 bp.

2.2 Mapping and filtering

We used the BAM pipeline implemented in Paleomix (Schubert et al., 2014) for processing raw data. This process first removed adapters and trimmed low-quality bases (min. quality-offset for Phred scores 33) with AdapterRemoval (Schubert, Lindgreen, & Orlando, 2016). Overlapping pairs were collapsed into one consensus read. All reads were then mapped with BWA-MEM, discarding poorly mapped reads under a quality threshold of 30 (Li & Durbin, 2009) against the reference transcriptome previously assembled and annotated by Nolen et al. (2020). These transcripts are organised in four partitions consisting of: (1) 16,969 single-copy genes, whereof 12,735 with identified open reading frames (ORFs) and 4,235 without; (2) 4,263 multi-copy genes; (3) 18,623 genes found only in some individuals; and (4) the complete mitochondrial genome. We visualised the number of nucleotides retained and coverage in partition 1 and 4 to assess the differences in coverage across localities (Fig. S1).
As phylogenetic inference requires genotype calling, we produced haplotypes for the ORFs of the 12,735 single-copy nuclear genes and for the 13 mitochondrial genes, calling the most frequent base at each position with a minimum coverage threshold of 10×, using ANGSD v0.921 (Korneliussen, Albrechtsen, & Nielsen, 2014). As most of the population genetics inference can handle uncertainty of genotypes, which is particularly important with RNAseq data where coverage varies with gene expression, we estimated genotype likelihoods using ANGSD. For this dataset, we used the 12,735 single-copy nuclear genes with identified ORFs, excluding the first and second codon positions that often correspond to nonsynonymous substitutions and hence are more affected by selection. We excluded reads with mapping quality < 15 after adjustment, base read quality < 20, or containing multiple hits. Additionally, we only considered sites present in min. 80% of individuals, and with a coverage depth greater than twice the total number of individuals (see https://github.com/lindington/ChorthPar for the detailed commands).

2.3 Population structure and admixture

To assess population structure and identify ongoing hybridisation, we used the population genetics dataset. First, we performed a principal component analysis (PCA) with ngsPopgen (Fumagalli, Vieira, Linderoth, & Nielsen, 2014) to test if genetic variation reflects the spatial distribution of the samples. We considered only variable sites with a SNP p -value < 1.0 × 10-2 (i.e. 1,494,335 SNPs), which has been shown to accurately reflect the site frequency spectrum based on all sites (Nolen et al., 2020). We expect most of the variance (PC1) to reflect subspecies and subsequent principal components to reflect known evolutionary lineages within subspecies. Second, we estimated admixture proportions in each individual, using the clustering algorithm implemented in NGSadmix (Skotte, Korneliussen, & Albrechtsen, 2013), which maximises Hardy Weinberg and linkage equilibria within K ancestral clusters. We considered K from two to 11, the number of sampling localities (10) plus the outgroup, with 50 replicates, selecting the replicate with the highest likelihood for each. Because of the inclusion of the outgroup, this analysis is based on 708,874 SNPs. Under a mutation-drift equilibrium, we expect each subspecies or sampling locality to form a cluster, with possible admixture near the centre of the hybrid zones.

2.4 Mitochondrial molecular clock

To infer the timing of diversification of the species we estimated a mitochondrial time tree. For this, we considered the 13 protein-coding genes to assure for site homology and avoid assembly and mapping errors in non-coding parts of the mitogenome. First, we aligned the 55 complete mitogenomes, annotated them using MITOS (Bernt et al., 2013), and extracted the alignments for the 13 protein-coding genes using a custom script (https://github.com/lindington/ChorthPar/). Second, we verified the absence of internal stop codons along the sequence with MEGA v10.1.6 (Tamura, Dudley, Nei, & Kumar, 2007), and concatenated the genes. Third, we imported the concatenated alignment into IQ-TREE v1.6.3 (Nguyen, Schmidt, von Haeseler, & Minh, 2015), estimated best-fit models of evolution for each gene using the BIC score as implemented in ModelFinder (Kalyaanamoorthy, Minh, Wong, Haeseler, & Jermiin, 2017), and assessed branch support using 1,000 replicates of ultrafast bootstrapping (Hoang, Chernomor, von Haeseler, Minh, & Vinh, 2018). The resulting maximum-likelihood (ML) tree carries the uncertainty of the estimated topology (Fig. S2). Finally, we inferred divergence times using BEAST v1.10.4 (Suchard et al., 2018). We used the uncorrelated relaxed clock model (Drummond, Ho, Phillips, & Rambaut, 2006), with a normally distributed prior for the substitution rate with mean 0.0115 and standard deviation 0.001 substitutions per million year (Brower, 1994), which is appropriate given the remarkably conserved mitochondrial rates in insects (e.g. 0.0115 in butterflies and 0.0133 in beetles; Brower, 1994; Papadopoulou, Anastasiou, & Vogler, 2010, respectively). The tree topology was fixed to the previously inferred ML topology, and we used a birth-death prior for speciation (Gernhard, 2008). BEAST runs were initialised with a chronogram based on the ML tree and the divergence time between P. p. erythropus and P. p. parallelus was set to a minimum age of 15 kya (end of the last glacial maximum; Ivy-Ochs et al., 2008) and a maximum age of 33.9 mya (the oldest Acrididae fossil; Song, Mariño-Pérez, Woller, & Cigliano, 2018), using the R package ape (Paradis & Schliep, 2019). To facilitate convergence, tree topology, clock model, and substitution model (GTR) were linked across genes. Three independent MCMC chains were run for 100 million generations, after a burn-in of 10 million, and sampling every 10,000 generations. We checked for convergence a posteriori using Tracer v1.7.1 (Rambaut, Drummond, Xie, Baele, & Suchard, 2018), and that all ESS values were above 200. We merged the independent chains using LogCombiner and estimated divergence times using TreeAnnotator after discarding the initial 10 million generations as burn-in.

2.5 Nuclear species tree

To assess the evolutionary relationships between sampling populations, we inferred a nuclear species tree under the multispecies coalescent model, which accommodates incomplete lineage sorting (ILS) expected for rapid radiations (Degnan & Rosenberg, 2009). First, we extracted the coding regions of the 12,735 single-copy nuclear genes for which ORFs were identified to avoid biases caused by assembly errors or under-expressed areas of the transcriptome, such as UTRs, which could increase the uncertainty of the estimated gene trees. Second, we retained all genes that had more than 300 called positions, and that were present in a minimum of three out of the five individuals per population. Third, we inferred a ML tree for every individual nuclear gene using IQ-TREE v1.6.3 (Nguyen et al., 2015) using BIC-selected models, 1,000 replicates of ultrafast bootstrapping (UFBoot), and SH-like approximate likelihood ratio tests (SH-aLRT; Guindon et al., 2010; Hoang et al., 2018). To account for the uncertainty in gene tree estimation, we collapsed branches with < 50% UFBoot support. Fourth, we assessed the information content of each nuclear gene tree and compared it to the mitochondrial tree (Strimmer & Haeseler, 1997), performing a likelihood quartet mapping test in IQ-TREE. Finally, we estimated the species tree that maximises the topology agreement between independent nuclear gene trees (Mirarab & Warnow, 2015), using ASTRAL v5.6.1 (Mirarab et al., 2014). From this analysis, we output the main species tree topology, posterior probabilities for this main topology, and branch lengths in coalescent units (T/Ne; Degnan & Rosenberg, 2009). We used this approach to estimate 1) an “individual species tree” where each individual is a terminal branch, and 2) a “population species tree” where individuals sampled in the 10 sampling localities are merged (as in Table S1). We expect the subspecies form two reciprocally monophyletic groups with hybrid individuals or populations having an intermediate position in the species tree.

2.6 Gene flow between populations

To test for the presence and magnitude of gene flow between all sampled populations, we used Patterson’s D -statistics (Durand, Patterson, Reich, & Slatkin, 2011). This test weights the fraction of biallelic sites that have a different topology from the previously estimated species tree. Using C. biguttulus as the outgroup carrying the ancestral A allele, we measured the proportion of ABBA and BABA sites for all possible combinations of three P. parallelus taxa that conformed to our estimated species tree (i.e. in 120 comparisons). We used Abbababa2 as implemented in ANGSD, which extends this analysis to comparisons among populations (Soraggi, Wiuf, & Albrechtsen, 2018). We restricted the analysis to sites with a SNP p -value < 1.0 × 10-6 and estimated significance using ap -value calculated from 10 windows of ~240,000 relevant sites. A similar proportion of ABBA and BABA (D = 0) sites is expected under the null hypothesis of ILS driving discordance, while significantly different proportions (D ≠ 0 with p -value < 0.05) must be explained by gene flow between two populations. We visualised Patterson’s D -statistics using a matrix of all possible pairwise comparisons (i.e. excluding sister taxa), using the highest reached D-value for each comparison. If hybrid male sterility does not confer a genome wide barrier to introgression, we expect to find high D -statistics between populations located in either side of the Pyrenean and Alpine hybrid zones.

2.7 The demographic history of hybrid zones

To quantify the relative amount of divergence and gene flow across the two hybrid zones, we estimated the demographic history of divergence of pairs of parental taxa at each zone separately (ERY vs PAR for the Pyrenees, and TAR vs GOM for the Alps). We considered four nested demographic models: (1) divergence without gene flow (“No Migration”, with three parameters: time since population splitT1 , and effective population sizeNe for each population); (2) divergence with continuous gene flow (“Symmetric Migration”, with a fourth migration rate parameter m ); (3) gene flow after secondary contact (“Secondary Contact”, with the fifth parameter time since secondary contact T2 , after which migration starts); and (4) divergence with asymmetric gene flow after secondary contact (“Secondary Contact with Asymmetric Migration”, with the sixthm2 parameter). First, we used ANGSD to build two-dimensional site frequency spectra (2D-SFS) for each of the 16,969 single copy nuclear genes and summed them into a single complete 2D-SFS per population pair. Second, the complete 2D-SFS was fit to all four demographic models using the diffusion approximation methods implemented in δaδi v2.0.5 (Gutenkunst, Hernandez, Williamson, & Bustamante, 2009), as described in Nolen et al. (2020; see https://github.com/zjnolen/chorthippus_radiation for commands), with four technical replicates to guarantee convergence of the estimated parameter values. Finally, we compared nested models using a likelihood ratio test (LRT), and estimated parameter uncertainties with the Godambe Information Matrix (Coffman, Hsieh, Gravel, & Gutenkunst, 2016; Godambe, 1960), which uses 100 nonparametric bootstrap SFSs to account with physical linkage of SNPs within the same transcript. This procedure selects the simplest model that captures most of the demographic signal contained in the 2D-SFS, and thus we chose a model that is highly supported in both hybrid zones in order to compare demographic parameters. The demographic parameters T and 2Nm were estimated in reference to the constant mutation rate μ , asμ is likely the same for such closely related populations.

2.8 Heterogeneity of gene flow

To estimate the relative effective population size for each sampling locality, we calculated per-gene Watterson’s θ (Watterson, 1975) and π (Nei & Li, 1979). We also estimated deviation from a neutral model of constant population size by calculating per-gene Tajima’s D (TD) (Tajima, 1989). For each summary statistics, we first built the one-dimensional SFS for each population and then calculated θ, π, and TD for each site in ANGSD. We used a custom script to average values across linked sites within each gene and plotted their distributions for the genes sampled across all populations (https://github.com/lindington/ChorthPar). We expect to find TD closer to zero in populations sampled near the putative glacial refugia, and negative values in populations sampled near the edge of post-glacial range expansions.
To test if the same genes show highest differentiation across the two hybrid zones, we estimated pairwise FST (Reynolds, Weir, & Cockerham, 1983) between pairs of parental populations interacting at each hybrid zone (ERY vs PAR in the Pyrenees, and TAR vs GOM in the Alps). We used the 2D-SFSs from demographic analyses to calculate a per-site and per-gene FST as implemented for other summary statistics. Using Fisher’s exact test (Fisher, 1922) as implemented in the R package GeneOverlap (Shen, 2020), we tested if there is significant overlap between genes with the highest 5% FST in the two hybrid zones, as expected in conformity with repeatable patterns of genetic differentiation. Because high FST can be caused either by gene-specific selection against gene flow or evolutionary constraints (Nachman & Payseur, 2012), we estimated dXY (Nei & Li, 1979). We used ANGSD and a script adapted from Peñalba (https://github.com/mfumagalli/ngsPopGen/blob/master/scripts/calcDxy.R) to estimate per-site and per-gene dXY as described above. We estimated the direction and significance of the correlation between FST and dXY estimates using a linear model. If high FST is caused by selection counteracting gene flow, we expect a positive correlation, while if high FST is caused by strong drift caused by shared genomic constraints, we expect a negative correlation.

3. RESULTS

3.1 Mapping and filtering

Overall, trimming and mapping efficiency was equivalent across populations (Table S2). The percentage of retained nucleotides after filtering ranged from 79% to 89%, and mapping efficiency of reads ranged from 84% to 90% (Fig. S1). The mitochondrial genome had a coverage between 16,000 to 52,000 x, reflecting its expected high transcription levels. The single-copy nuclear genes with identified ORF had a coverage between 46× to 125× (Fig. S1), largely reflecting sequencing effort, which was larger in the Pyrenean populations (ERY, PHZ, and PAR).

3.2 Population structure

A relatively large fraction of the genetic variance (11.9 %) is explained by Principal Component (PC) 1, which separates individuals of the two subspecies, reflecting a West-East gradient (Fig. 1B). PC2 explains 3.85% of the variance and distinguishes individuals the two evolutionary lineages known within the subspecies P. p. parallelus , reflecting a North-South gradient. PC3 and PC4 explain 3% and 2.8% of the variance, respectively, and distinguish the Austrian samples (DOB) from all others (Fig. S3).
Results from our admixture analysis show a continuous increasing likelihood with increasing K (Table S3). The first split within P. parallelus distinguishes the two subspecies, with some individuals from the Pyrenean hybrid zone showing admixture (Fig. 2B). Next, the European and Italian lineages of P. p. parallelus separate, with admixture found in the Alpine hybrid zone and the two locations closer to the Balkans (DOB and SLO). The Austrian population (DOB) becomes distinct at K = 5 and does not show admixture with any of the neighbouring localities (Fig. 2B). At K = 11 each sampling locality forms its own cluster (Fig. S4), with admixture found only in two individuals collected closer to the reference population of P. p. erythropusof the Pyrenees (Table S1).

3.2 Mitochondrial molecular clock

The maximum likelihood mitochondrial tree (Fig. S2) shows a well-supported topology (Bootstrap (BP) values > 95), where the two subspecies and most of the populations are not reciprocally monophyletic. The mitochondrial lineages form six major clades (A-F), most of them containing lineages found in a single subspecies. The exception are clades A and F, which contain internal nodes that separateP. p. parallelus and P. p. erythropus . We used those internal nodes to estimate the mean divergence time between subspecies.
The time to most recent common ancestor (TMRCA) of all P. parallelus is estimated to be around 390 ka (95% highest posterior density: 432-322). The diversification of the six major clades ranges from 218 ka to 388 ka (Table S4). The TMRCA between subspecies in clades A and F occurred simultaneously, around 100 ka (128-66).

3.3 Nuclear species tree

After filtering, we obtained 5,929 genes that were used for species trees estimation. The likelihood mapping analysis shows that most gene trees have enough power to resolve 60-80% of the quartets, reflecting high information content of nuclear gene trees (Fig. S5). Individual gene trees show a wide range of sorting between alleles from each subspecies, from high gene tree discordance similarly to the mitochondrial genome, to two reciprocally monophyletic clades (Fig. S6).
The individual species tree (Fig. S7) shows that P. p. erythropusand P. p. parallelus are reciprocally monophyletic. Most populations form monophyletic clades (posterior probability, PP = 1) with only three exceptions. Individuals from the two sub-localities of the Pyrenean hybrid zone, sampled only 700 m apart, cluster separately: la Troya individuals form a sister clade to the referenceerythropus of the Pyrenees (ERY: sub-localities Biescas and Escarrilla, sampled 12 km apart), and Corral de Mulas individuals group as another paraphyletic grade at the base of the P. p. erythropusclade. Individuals from the two sub-localities of the parallelusfrom the Pyrenees (PAR: Gabas and Arudy, sampled 25 km apart), form a paraphyletic grade to a clade containing all individuals from West Austria (TAR). Finally, individuals from the Alpine hybrid zone (AHZ) also form a paraphyletic grade to a clade containing all Italian individuals (GOM).
The relationships between populations are well established in the “population species tree”, where all branches have a support with PP > 0.95 (Fig. 2A). In agreement with the “individual species tree”, populations of P. p. erythropus and P. p. parallelus are reciprocally monophyletic (PP > 0.95). In the erythropus clade, the Pyrenean hybrid zone (PHZ) branches off first, followed by the Portuguese population, and the referenceerythropus of the Pyrenees (ERY) being sister to the population located in the putative glacial refugium at the Iberian Central System (CSY). The parallelus clade shows an early split of the East Austrian population (DOB), followed by a split between a clade containing the north European populations (PAR and TAR) from another clade comprising the southern European populations (SLO, GOM), including the Alpine hybrid zone (AHZ).

3.4 Gene flow between populations

Out of the 360 possible comparisons, 120 conform to our nuclear species tree (Fig. 2A). 114 have D-statistics that are significantly different from zero (p -values < 0.01574; Table S5), consistent with gene flow between 34 population pairs (Fig. 4). Six populations pairs show no significant deviations from the null expectation of no gene flow, and three populations pairs could not be tested because they are sister taxa. All populations experience gene flow with at least one non-sister taxon. Notably, populations closer to the Pyrenean hybrid zone show the highest values of D -statistics across comparisons, i.e. the reference parallelus population in France (PAR) consistently shows high gene flow with all populations oferythropus from Iberia, and conversely the Pyrenean hybrid population (PHZ) consistently shows high gene flow with all populations of parallelus . The Slovenian population (SLO) also shows high values of D -statistics with the Italian population (GOM and AHZ), also consistent with gene flow.

3.5 The demographic history of hybrid zones

In our demographic models, all technical replicates converged to similar likelihoods and parameter estimations. The model with the highest likelihood is the most parameter rich, i.e. the “SC with Asymmetric Migration” model. Yet, when penalising the likelihoods for an increased number of parameters using the adjusted likelihood ratio test, we find that the simplest model that can explain the observed SFS is the “Symmetric Migration” model for the Pyrenean hybrid zone, and in the “Secondary Contact” model for the Alpine hybrid zone (Table S6).
To allow comparisons across hybrid zones, we thus chose the simpler “Symmetric Migration” model for parameter estimation. EstimatedNe s of parental populations are similar in the Alpine hybrid zone, but they show a two-fold difference with non-overlapping standard deviations in the Pyrenean hybrid zone (parallelus > erythropus ). The parameter estimation converged on older divergence time T(~1.3 times) and lower 2Nm (mean ~0.7 times) for the Pyrenean hybrid zone relative to the Alpine hybrid zone, but with overlapping standard deviations (Table S7).

3.6 Heterogeneity of gene flow

Distributions of per-gene Watterson’s θ and π (Fig. S8) were centred close to zero with populations differing in the length of the tail of the distribution and in their mean. Mean π varied between 0.0026 (CSY) and 0.0033 (PAR), with higher values found in the Pyrenean and Alpine hybrid zones, in the admixed population of Slovenia, and in the population of parallelus from the Pyrenees (PAR). A similar trend was observed in means of per-gene Watterson’s θ (varying between 6.594 (CSY) and 9.330 (PAR)), with the population of Portugal also showing high diversity.
The distributions of per-gene Tajima’s D were generally centred around negative values in all populations (Fig. 5B), consistent with a general demographic expansion. Populations located in proximity to putative refugia (CSY for erythropus , and GOM for the Italianparallelus ) have means closest to zero, suggesting a smaller deviation from neutrality and thus relative more demographic stability. Notably, in the eastern Austrian population (DOB) and in the southern Pyrenean population (PAR) we find means close to zero, also conforming to expectations of relative demographic stability. We found the most negative Tajima’s D in populations further away from the known refugia (POR for erythropus , and PAR for the northern Europeanparallelus ), consistent with a stronger range expansion. Notably, the Slovenian population SLO has a similarly negative Tajima’s D, also conforming to expectations of demographic range expansion.
The distributions of per-gene FST across the two hybrid zones are unimodal with a positive skew. Genes in the highest 5% per-gene FST have values above 0.63 for the lineages ofparallelus hybridizing in the Alps, and above 0.74 for the subspecies hybridizing in the Pyrenees. Out of these 575 genes, 117 overlap, which is not expected to occur by chance (p -value = 1.1 × 10-41). Per-gene dXY distributions are also unimodal and with positive skews in both hybrid zones (Fig. S9). We found a significant negative correlation (slope = -0.0150551;p -value < 2 × 10-16) between dXY and FST (Fig. 6B) in both hybrid zones, consistent with shared genomic constraints across populations.

4. DISCUSSION

The evolution of hybrid sterility in the heterogametic sex, or Haldane’s rule is one of the few recurrent patterns in species formation (Haldane, 1922). Yet, to understand the conditions under which this rule evolves, its consequences for reducing gene flow between emerging species, and its association to recurrent patterns of genomic differentiation, it is required to study taxa at early stages of species formation. We provide some answers to these questions using lineages of the grasshopper species Pseudochorthippus parallelus , which express hybrid male sterility in laboratorial crosses (Flanagan, 1997; Hewitt et al., 1987) but have been interacting in two hybrid zones for thousands of generations (Butlin, 1998; Hewitt, 1993).

4.1 The phylogeographic history ofPseudochorthippus parallelus

Pseudochorthippus parallelus has been fundamental for our current understanding of biogeographic patterns in Europe (Hewitt, 2000) and has early been recognised as an important window into species formation (Barton & Hewitt, 1985; Futuyma, Shapiro, & Harrison, 1995). Early studies based on small mitochondrial DNA fragments presented a relatively simple scenario of divergence in three southern glacial refugia (Iberia, Italy, Balkans; Cooper et al., 1995) and the evolution of multiple reproductive barriers, including hybrid male sterility (Hewitt et al., 1987; Shuker, Underwood, King, & Butlin, 2005). However, understanding how reproductive barriers evolved requires estimating the time of divergence between these lineages and their phylogeographic history.
Our estimated mitochondrial time tree based on the complete mitochondrial genome places the time for the most recent common ancestor of all P. parallelus in the mid Pleistocene, around 389,000 years ago (95% HPD: 432,200 – 322,600; Fig. 3, Table S4). Similarly to what we found in thousands of independent nuclear gene trees (Fig. S6) the mitochondrial tree shows extensive allele sharing between subspecies (Fig. 3; Fig. S2). Such patterns of allele sharing are often observed in other insects characterised by large effective population sizes and interspecific gene flow (Ebdon et al., 2021; Pollard, Iyer, Moses, & Eisen, 2006), including in grasshopper species of the related genusChorthippus (Nolen et al., 2020), suggesting that probably both ILS and gene flow underlie such patterns. Two out of the six mitochondrial clades have sub-lineages that are fixed among subspecies (nodes within clades A and F, Fig. 3), allowing us to estimate a minimum time of divergence between the two subspecies P. p. parallelusand P. p. erythropus . In both clades, divergence is estimated to have occurred around 100,000 years ago (127,900 – 65,900), placing divergence time at the beginning of the last glacial cycle in Europe (Ganopolski & Brovkin, 2017).
Using information from nuclear genes we were able to establish a well-supported phylogeographic scenario for the diversification ofP. parallelus in Europe. As expected, the oldest split occurred between the two subspecies (Fig. 2A, Fig. S7), reflecting the oldest vicariant event between the Iberian Peninsula, where erythropusdiverged and remains isolated, and the ancestor of parallelus . The population near the centre of the Pyrenean hybrid zone is at the base of the erythropus clade, probably due to introgression fromparallelus that is not accounted for in the multispecies coalescent, but that is clear in the individual patterns of admixture (Fig. 2B). Within the pure erythropus , we find evidence for an early split between a Portuguese (POR) and a Spanish lineage, followed by the colonisation of the southern slope of the Pyrenees (ERY) from the Spanish side of the Iberian Central System (CSY). Patterns of genetic diversity (Tajima’s D; Fig. 5) suggest demographic stability at both Spanish localities, but with a stronger range expansion in the Portuguese side of the Iberian Central System. These observations are consistent with two sub-refugia within Iberia, a recurrent phylogeographic pattern across Iberian plant and animal species that has been popularised as “refugia within refugium” (Feliner, 2011; Gómez & Lunt, 2007). Earlier cytogenetic work in erythropus (Bella, Hewitt, & Gosálvez, 1990) has shown the existence of two variants of the X chromosome that differ in the location of the active nucleolar organising regions (NORs). Because NORs are associated to rDNA activity during meiosis, this finding suggests functional divergence of the sexual chromosomes of the Portuguese and Spanish erythropus . Future experimental crosses or study of a cryptic hybrid zone along the Iberian Central System can test whether such divergence has resulted in some degree of reproductive isolation within this subspecies.
Within the parallelus clade, we do not find evidence for an earlier split between the Italian and the Balkan lineages, as suggested by previous studies (Cooper et al., 1995), but rather for the divergence of the Austrian lineage located in the East of the Alps (DOB). This lineage is distinct from all other known lineages, including from the Balkans lineage detected in Slovenia (SLO). Notably, these individuals form a monophyletic clade that has diverged around 100,000 years ago (95% HPD: 123,400 – 71,100) from the remaining parallelus(Table S4, Fig. 3; split of DOB within clade A), does not hybridise with nearby populations (Fig. 2B, S4), and has a Tajima’s D consistent with demographic stability (TD= -0.217; Fig. 5). These observations are consistent with this population representing an old refugium for parallelus , suggesting that some relictual populations have survived the last glacial period in a micro-refugium within the Alps, despite this large mountain system being largely glaciated until 15,000 years ago (Ivy-Ochs et al., 2008). Small isolated refugia have been demonstrated for alpine plants (Stehlik, Blattner, Holderegger, & Bachmann, 2002), beetles (Lohse, Nicholls, & Stone, 2011) and in jumping bristletails (Wachter et al., 2012). The existence of species of grasshoppers micro-endemic to isolated regions of the Alps, e.g. Podismopsis keisti (Nadig, 1989) and Podismopsis styriaca (Koschuh, 2008), suggests that such Alpine refugia might have played an important role for current biogeographic patterns in Europe. The next split within parallelus refers to the vicariant event between the broadly distributed central European lineage and the Italian lineage. Previous studies have shown that these lineages have diverged in one NOR at the distal end of the X chromosome (Flanagan et al., 1999) and give rise to sterile hybrid males in laboratory crosses (Flanagan, 1997). This suggests genetic incompatibilities leading to hybrid male sterility can evolve fairly rapidly in this system.
In addition to relatively short divergence times, demographic history ofP. parallelus strongly deviates from neutral expectations (Fig. 5). Notably, our estimates of Tajima’s D are negative in populations on the northern slopes of mountain ranges known to harbour important glaciers during the Pleistocene, such as the Iberian Central System (POR, TD = -0.169), the Pyrenees (PAR, TD = -0.183), the Alps (TAR, TD = -0.088), and Slovenia (SLO, TD = -0.217), consistent with demographic range expansions onto northern slopes after the ice sheets retreated. This molecular signature of demographic expansion is strongest in the European lineage of parallelus in the French Pyrenees (PAR), which is genetically similar to the Austrian population collected ~990 km away (Fig. PCA) and shows the longest coalescent time (T/ Ne ) in the species tree (Fig. 2A). The effect of strong genetic drift before secondary contact is further reflected in the relatively high FST values observed between taxa that established secondary contact, both in the Pyrenean (global FST = 0.395) and in the Alpine (global FST = 0.286) hybrid zones.
Together, this phylogeographic scenario show that hybrid male sterility has evolved at least between three different lineages of P. parallelus that diverged relatively recently, within the last 389,000 years, corresponding to the same number of generations in this species. The evolution of hybrid male sterility is thus significantly faster relative to what has been estimated in well studied systems, such as Drosophilids (that diverged more than 300,000 years or more than 2.4 million generations ago; Kliman et al., 2000) or the house mice (that diverged more than 500,000 years or 1 million generations ago; Geraldes et al., 2008). Strong periods of genetic drift, not only caused by the allopatric divergence during the glacial periods, but also due to a strong range expansion during the current interglacial period, probably facilitated the fixation of incompatibilities that cause hybrid male sterility.

4.2 Gene flow betweenPseudochorthippus parallelus subspecies and populations

Although the reference populations sampled here in the Pyrenean hybrid zone are known to produce sterile F1 males (Flanagan, 1997; Hewitt et al., 1987), backcrossing through fertile F1 females leads to males with some, but largely reduced, fertility (Virdee & Hewitt, 1992). Strong hybrid male sterility is observed even between populations that are 4-6 km apart (Virdee & Hewitt, 1994), suggesting that it represents a strong barrier to gene flow in natural populations. Here, we test if long-term recombination in the hybrid zones (~8,000 generations ago; Hewitt, 1988) have allowed for genetic introgression despite Haldane’s rule.
Our analyses show that the samples collected near the putative centres of the two hybrid zones are indeed admixed (Fig. 1B, Fig 2B), consistent with ongoing backcrossing in natural populations. Both hybrid populations were collected in the southern slopes of the mountains, and accordingly have most of their ancestry in their respective parental southern lineage, showing that the centres of the hybrid zones are located closer to the top of the mountain ridges.
Demographic models of divergence between pairs of parental populations of both hybrid zones significantly rejected models of divergence without gene flow in favour of a model with symmetric gene flow (p-value = 0). Biologically more realistic models (e.g. secondary contact with symmetric or asymmetric migration) had higher likelihoods but were not significantly better, reflecting a limited power to estimate extra parameters (Table S6). Consistent with our estimated species tree (Fig. 2A) and their levels of genetic differentiation (Table S8), we estimate that the lineages interacting in the Pyrenean hybrid zone are approximately more divergent (1.3 times) and experience less (0.7 times) gene flow than the lineages interacting in the Alpine hybrid zone. Given that the two hybrid zones are at the same altitude, likely have the same age (Palacios et al., 2017), and that the parental populations were sampled at a similar geographic distance (50 km), this suggests that barriers to gene flow are stronger between the subspeciesparallelus and erythropus establishing secondary contact in the Pyrenees than between the two lineages of parallelusestablishing contact in the Alps. Yet, caution is needed when interpreting these results, since the confidence intervals of the parameter estimates are overlapping (Table S7). Our estimates of the population migration rate 2Nm (Pyrenees: 0.143 and 0.288; Alps: 0.294 and 0.333) indicate that genetic drift is stronger than gene flow (2Nm <1; Slatkin, 1985). Yet, these estimates are likely a underrepresentation of the true effective migration rates experienced at the centre of the hybrid zone because these parental populations are geographically isolated and estimates of 2Nm are averaged over the entire divergence time (Hamilton et al., 2005).
Using the ABBA-BABA-test, we infer the amount of introgression across all pairs of populations. We observe that, except for 6 population pairs, the remaining 36 possible pairs show significant gene flow (p -values < 0.01574, Fig. 4). Notably, populations close to the hybrid zones show the highest level of introgression, even though these populations likely have the strongest total reproductive isolation (Virdee & Hewitt, 1994). For example, the pureparallelus population of the French Pyrenees (PAR) shows abundant introgression with all erythropus populations (D>0.18), which generally decreases with geographic distance. Conversely, the hybrid population of the Pyrenees shows strong introgression with all parallelus populations, also largely decreasing with geographic distance.
Numerous studies have found patterns of repeatable genomic differentiation associated to the repeated evolution of several forms of reproductive isolation, e.g. ecological incompatibilities in stick insects (Riesch et al., 2017) and marine snails (Westram et al., 2021), and behavioural incompatibilities in crows (Poelstra et al., 2014) and newts (Zieliński et al., 2019). While high values of FSTcan in fact reflect barriers to gene flow in the genome, because FST depends on heterozygosity within populations, high FST can also reflect low divergence at those specific loci. Although we find overlap between genes exhibiting high values of FST in the two hybrid zones, we also find a negative correlation between FST and dXY in both hybrid zones (p-values < 2 × 10-16). This suggests that the repeatable patterns of genomic differentiation in the two hybrid zones may be unrelated to locus-specific barriers to gene flow, but may simply reflect genomic constraints, such as low mutation, recombination rates, or linked selection and/or background selection experience in the ancestral population (Nachman & Payseur, 2012).
Together, these results suggest that, while hybrid male sterility imposes a strong fitness cost in early generations of hybrids, the rates of effective recombination that are observed in natural hybrid zones are enough to permit introgression in some parts of the genome. The observation that the amount of introgression decreases with the geographic distance to the current hybrid zone suggests that introgression is at least partially explained by gene flow after the recent secondary contact. The observation of extensive rates of introgression in spite of strong reproductive barriers has been described in multiple hybridising systems, such as in birds (Poelstra et al., 2014; Semenov et al., 2021), butterflies (Heliconius Genome Consortium, 2012), and mice (Teeter et al., 2007). Future studies of differential patterns of introgression across the genome may help in pinpointing the genes that resist to such prevailing pattern of genome wide introgression (Rafati et al., 2018; Turner & Harr, 2014), and therefore that are involved in barriers to gene flow (Barton & Hewitt, 1989; Harrison & Larson, 2014).

5. ACKNOWLEDGEMENTS

We thank the Spanish and French authorities from the Gobierno de Aragón and the French Parc National des Pyrénées for the permission to sample the individuals used for this study. RJP was funded by the European Union’s Horizon 2020 research and innovation programme, under the Marie Sklodowska-Curie grant agreement No. 658706. OH was funded by DFG grant HA7255/2-1. This project benefitted from the sharing of expertise within the DFG priority program SPP 1991 Taxon-Omics.

6. REFERENCES

Barton, N. H., & Hewitt, G. M. (1985). Analysis of Hybrid Zones.Annual Review of Ecology and Systematics , 16 (1), 113–148. doi: 10.1146/annurev.es.16.110185.000553
Barton, N. H., & Hewitt, G. M. (1989). Adaptation, speciation and hybrid zones. Nature , 341 (6242), 497–503. doi: 10.1038/341497a0
Bateson, W. (1909). Heredity and variation in modern lights.Darwin and Modern Science .
Bella, J. L., Butlin, R. K., Ferris, C., & Hewitt, G. M. (1992). Asymmetrical homogamy and unequal sex ratio from reciprocal mating-order crosses between Chorthippus parallelus subspecies. Heredity ,68 (4), 345–352. doi: 10.1038/hdy.1992.49
Bella, J. L., Hewitt, G. M., & Gosálvez, J. (1990). Meiotic imbalance in laboratory-produced hybrid males of Chorthippus parallelus parallelus and Chorthippus parallelus erythropus. Genetics Research ,56 (1), 43–48. doi: 10.1017/S001667230002886X
Berdan, E. L., Mazzoni, C. J., Waurick, I., Roehr, J. T., & Mayer, F. (2015). A population genomic scan in Chorthippus grasshoppers unveils previously unknown phenotypic divergence. Molecular Ecology ,24 (15), 3918–3930. doi: https://doi.org/10.1111/mec.13276
Bernt, M., Donath, A., Jühling, F., Externbrink, F., Florentz, C., Fritzsch, G., … Stadler, P. F. (2013). MITOS: Improved de novo metazoan mitochondrial genome annotation. Molecular Phylogenetics and Evolution , 69 (2), 313–319. doi: 10.1016/j.ympev.2012.08.023
Brothers, A. N., & Delph, L. F. (2010). Haldane’s Rule Is Extended to Plants with Sex Chromosomes. Evolution , 64 (12), 3643–3648. doi: 10.1111/j.1558-5646.2010.01095.x
Brower, A. V. (1994). Rapid morphological radiation and convergence among races of the butterfly Heliconius erato inferred from patterns of mitochondrial DNA evolution. Proceedings of the National Academy of Sciences , 91 (14), 6491–6495. doi: 10.1073/pnas.91.14.6491
Butlin, R. K. (1998). What do hybrid zones in general, and the Chorthippus parallelus zone in particular, tell us about speciation.Endless Forms: Species and Speciation. Oxford Univ. Press, New York , 367–378.
Butlin, R. K., & Hewitt, G. M. (1985). A hybrid zone between Chorthippus parallelus parallelus and Chorthippus parallelus erythropus (Orthoptera: Acrididae): Behavioural characters. Biological Journal of the Linnean Society , 26 (3), 287–299.
Butlin, R. K., & Ritchie, M. G. (1991). Variation in female mate preference across a grasshopper hybrid zone. Journal of Evolutionary Biology , 4 (2), 227–240. doi: 10.1046/j.1420-9101.1991.4020227.x
Butlin, R. K., Ritchie, M. G., & Hewitt, G. M. (1991). Comparisons among morphological characters and between localities in the Chorthippus parallelus hybrid zone (Orthoptera: Acrididae). Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences , 334 (1271), 297–308. doi: 10.1098/rstb.1991.0119
Christin, P.-A., Weinreich, D. M., & Besnard, G. (2010). Causes and evolutionary significance of genetic convergence. Trends in Genetics , 26 (9), 400–405.
Coffman, A. J., Hsieh, P. H., Gravel, S., & Gutenkunst, R. N. (2016). Computationally Efficient Composite Likelihood Statistics for Demographic Inference. Molecular Biology and Evolution ,33 (2), 591–593. doi: 10.1093/molbev/msv255
Cooper, S. J. B., Ibrahim, K. M., & Hewitt, G. M. (1995). Postglacial expansion and genome subdivision in the European grasshopper Chorthippus parallelus. Molecular Ecology , 4 (1), 49–60. doi: https://doi.org/10.1111/j.1365-294X.1995.tb00191.x
Coyne, J. A. (2018). “Two Rules of Speciation” revisited.Molecular Ecology , 27 (19), 3749–3752. doi: https://doi.org/10.1111/mec.14790
Coyne, J. A., & Orr, H. A. (1989). Patterns of speciation in Drosophila. Evolution , 43 (2), 362–381.
Coyne, J. A., & Orr, H. A. (2004). Speciation (Vol. 37). Sinauer Associates Sunderland, MA.
Defaut, B. (2012). Implications taxonomiques et nomenclaturales de publications récentes en phylogénie moléculaire: 1. Les Gomphocerinae de France (Orthoptera, Acrididae). Matériaux Orthoptériques et Entomocénotiques , 17 , 15–20.
Degnan, J. H., & Rosenberg, N. A. (2009). Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends in Ecology & Evolution , 24 (6), 332–340. doi: 10.1016/j.tree.2009.01.009
Dobzhansky, T. (1936). Studies on hybrid sterility. II. Localization of sterility factors in Drosophila pseudoobscura hybrids. Genetics ,21 (2), 113.
Drummond, A. J., Ho, S. Y., Phillips, M. J., & Rambaut, A. (2006). Relaxed phylogenetics and dating with confidence. PLoS Biol ,4 (5), e88.
Durand, E. Y., Patterson, N., Reich, D., & Slatkin, M. (2011). Testing for ancient admixture between closely related populations.Molecular Biology and Evolution , 28 (8), 2239–2252.
Ebdon, S., Laetsch, D. R., Dapporto, L., Hayward, A., Ritchie, M. G., Dincӑ, V., … Lohse, K. (2021). The Pleistocene species pump past its prime: Evidence from European butterfly sister species.Molecular Ecology , mec.15981. doi: 10.1111/mec.15981
Feliner, G. N. (2011). Southern European glacial refugia: A tale of tales. TAXON , 60 (2), 365–372. doi: 10.1002/tax.602007
Fisher, R. A. (1922). On the interpretation of χ 2 from contingency tables, and the calculation of P. Journal of the Royal Statistical Society , 85 (1), 87–94.
Flanagan, Mason, Gosálvez, & Hewitt, G. M. (1999). Chromosomal differentiation through an Alpine hybrid zone in the grasshopperChorthippus parallelus . Journal of Evolutionary Biology ,12 (3), 577–585. doi: 10.1046/j.1420-9101.1999.00049.x
Flanagan, N. S. (1997). Incipient speciation in the meadow grasshopper, Chorthippus parallelus (Orthoptera: Acrididae). (PhD Thesis). University of East Anglia.
Fumagalli, M., Vieira, F. G., Linderoth, T., & Nielsen, R. (2014). ngsTools: Methods for population genetics analyses from next-generation sequencing data. Bioinformatics , 30 (10), 1486–1487.
Futuyma, D. J., Shapiro, L. H., & Harrison, R. G. (1995). Hybrid Zones and the Evolutionary Process. Evolution , 49 (1), 222. doi: 10.2307/2410309
Ganopolski, A., & Brovkin, V. (2017). Simulation of climate, ice sheets and CO2 evolution during the last four glacial cycles with an Earth system model of intermediate complexity. Climate of the Past ,13 (12), 1695–1716. doi: 10.5194/cp-13-1695-2017
Geraldes, A., Basset, P., Gibson, B., Smith, K. L., Harr, B., Yu, H.-T., … Nachman, M. W. (2008). Inferring the history of speciation in house mice from autosomal, X-linked, Y-linked and mitochondrial genes.Molecular Ecology , 17 (24), 5349–5363. doi: 10.1111/j.1365-294X.2008.04005.x
Gernhard, T. (2008). The conditioned reconstructed process.Journal of Theoretical Biology , 253 (4), 769–778.
Godambe, V. P. (1960). An Optimum Property of Regular Maximum Likelihood Estimation. The Annals of Mathematical Statistics , 31 (4), 1208–1211. JSTOR. doi: 10.1214/aoms/1177705693
Gómez, A., & Lunt, D. H. (2007). Refugia within Refugia: Patterns of Phylogeographic Concordance in the Iberian Peninsula. In S. Weiss & N. Ferrand (Eds.), Phylogeography of Southern European Refugia: Evolutionary perspectives on the origins and conservation of European biodiversity (pp. 155–188). Dordrecht: Springer Netherlands. doi: 10.1007/1-4020-4904-8_5
Gosálvez, J., López-Fernández, C., Bella, L. J., Butlin, R. K., & Hewitt, G. M. (1988). A hybrid zone between Chorthippus parallelus parallelus and Chorthippus parallelus erythropus (Orthoptera: Acrididae): chromosomal differentiation. Genome , 30 (5), 656–663. doi: 10.1139/g88-111
Guindon, S., Dufayard, J.-F., Lefort, V., Anisimova, M., Hordijk, W., & Gascuel, O. (2010). New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0.Systematic Biology , 59 (3), 307–321. doi: 10.1093/sysbio/syq010
Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H., & Bustamante, C. D. (2009). Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet ,5 (10), e1000695.
Haldane, J. B. S. (1922). Sex ratio and unisexual sterility in hybrid animals. Journal of Genetics , 12 (2), 101–109. doi: 10.1007/BF02983075
Hamilton, G., Currat, M., Ray, N., Heckel, G., Beaumont, M., & Excoffier, L. (2005). Bayesian Estimation of Recent Migration Rates After a Spatial Expansion. Genetics , 170 (1), 409–417. doi: 10.1534/genetics.104.034199
Harrison, R. G. (1990). Hybrid zones: Windows on evolutionary process.Oxford Surveys in Evolutionary Biology , 7 , 69–128.
Harrison, R. G. (1993). Hybrids and hybrid zones: Historical perspective. Hybrid Zones and the Evolutionary Process , 3–12.
Harrison, R. G., & Larson, E. L. (2014). Hybridization, introgression, and the nature of species boundaries. Journal of Heredity ,105 (S1), 795–809.
Heliconius Genome Consortium. (2012). Butterfly genome reveals promiscuous exchange of mimicry adaptations among species.Nature , 487 (7405), 94–98. doi: 10.1038/nature11041
Hewitt, G. M. (1988). Hybrid zones-natural laboratories for evolutionary studies. Trends in Ecology & Evolution , 3 (7), 158–167. doi: 10.1016/0169-5347(88)90033-X
Hewitt, G. M. (1993). After the ice: Parallelus meets erythropus in the Pyrenees. Hybrid Zones and the Evolutionary Process , 140–164.
Hewitt, G. M. (2000). The genetic legacy of the Quaternary ice ages.Nature , 405 (6789), 907–913. doi: 10.1038/35016000
Hewitt, G. M., Butlin, R. K., & East, T. M. (1987). Testicular dysfunction in hybrids between parapatric subspecies of the grasshopper Chorthippus parallelus. Biological Journal of the Linnean Society , 31 (1), 25–34. doi: 10.1111/j.1095-8312.1987.tb01978.x
Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q., & Vinh, L. S. (2018). UFBoot2: Improving the Ultrafast Bootstrap Approximation.Molecular Biology and Evolution , 35 (2), 518–522. doi: 10.1093/molbev/msx281
Ivy-Ochs, S., Kerschner, H., Reuther, A., Preusser, F., Heine, K., Maisch, M., … Schlüchter, C. (2008). Chronology of the last glacial cycle in the European Alps. Journal of Quaternary Science , 23 (6–7), 559–573. doi: 10.1002/jqs.1202
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., Haeseler, A. von, & Jermiin, L. S. (2017). ModelFinder: Fast model selection for accurate phylogenetic estimates. Nature Methods , 14 (6), 587–589. doi: 10.1038/nmeth.4285
Kliman, R. M., Andolfatto, P., Coyne, J. A., Depaulis, F., Kreitman, M., Berry, A. J., … Hey, J. (2000). The Population Genetics of the Origin and Divergence of the Drosophila simulans Complex Species.Genetics , 156 (4), 1913–1931. doi: 10.1093/genetics/156.4.1913
Korkmaz, E. M., Lunt, D. H., Çıplak, B., Değerli, N., & Başıbüyük, H. H. (2014). The contribution of Anatolia to European phylogeography: The centre of origin of the meadow grasshopper, Chorthippus parallelus.Journal of Biogeography , 41 (9), 1793–1805. doi: 10.1111/jbi.12332
Korneliussen, T. S., Albrechtsen, A., & Nielsen, R. (2014). ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics ,15 (1), 356. doi: 10.1186/s12859-014-0356-4
Koschuh, A. (2008). Podismopsis styriaca nov. Sp.(Orthoptera, Acridinae) ein Endemit im Ostalpenraum . na.
Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics , 25 (14), 1754–1760. doi: 10.1093/bioinformatics/btp324
Lohse, K., Nicholls, J. A., & Stone, G. N. (2011). Inferring the colonization of a mountain range—Refugia vs. Nunatak survival in high alpine ground beetles. Molecular Ecology , 20 (2), 394–408. doi: 10.1111/j.1365-294X.2010.04929.x
Lunt, D. H., Ibrahim, K. M., & Hewitt, G. M. (1998). MtDNA phylogeography and postglacial patterns of subdivision in the meadow grasshopper Chorthippus parallelus. Heredity , 80 (5), 633–641. doi: 10.1046/j.1365-2540.1998.00311.x
Masly, J. P., & Presgraves, D. C. (2007). High-resolution genome-wide dissection of the two rules of speciation in Drosophila. PLoS Biol , 5 (9), e243.
Matute, D. R. (2010). Reinforcement can overcome gene flow during speciation in Drosophila. Current Biology , 20 (24), 2229–2233.
Matute, D. R., & Cooper, B. S. (2021). Comparative studies on speciation: 30 years since Coyne and Orr. Evolution ,75 (4), 764–778. doi: https://doi.org/10.1111/evo.14181
Mirarab, S., Reaz, R., Bayzid, M. S., Zimmermann, T., Swenson, M. S., & Warnow, T. (2014). ASTRAL: Genome-scale coalescent-based species tree estimation. Bioinformatics , 30 (17), i541–i548.
Mirarab, S., & Warnow, T. (2015). ASTRAL-II: Coalecent-based species tree estimation with many hundreds of taxa and thousands of genes.Bioinformatics , 31 (12), i44–i52. doi: 10.1093/bioinformatics/btv234
Muller, H. (1942). Isolating mechanisms, evolution, and temperature . 6 , 71–125.
Nachman, M. W., & Payseur, B. A. (2012). Recombination rate variation and speciation: Theoretical predictions and empirical results from rabbits and mice . 13.
Nadig, A. (1989). Eine aus den Alpen bisher unbekannte Untergattung in der Schweiz: Chrysochraon (Podismopsis) keisti sp. N.(Saltatoria, Acrididae). Mitteilungen Der Schweizerischen Entomologischen Gesellschaft , 62 (1–2), 79–86.
Nei, M., & Li, W.-H. (1979). Mathematical model for studying genetic variation in terms of restriction endonucleases. Proceedings of the National Academy of Sciences , 76 (10), 5269–5273.
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A., & Minh, B. Q. (2015). IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Molecular Biology and Evolution ,32 (1), 268–274. doi: 10.1093/molbev/msu300
Nolen, Z. J., Yildirim, B., Irisarri, I., Liu, S., Groot Crego, C., Amby, D. B., … Pereira, R. J. (2020). Historical isolation facilitates species radiation by sexual selection: Insights fromChorthippus grasshoppers. Molecular Ecology ,29 (24), 4985–5002. doi: 10.1111/mec.15695
Nürnberger, B., Lohse, K., Fijarczyk, A., Szymura, J. M., & Blaxter, M. L. (2016). Para-allopatry in hybridizing fire-bellied toads (Bombina bombina and B. variegata): Inference from transcriptome-wide coalescence analyses. Evolution , 70 (8), 1803–1818.
Palacios, D., de Andrés, N., Gómez-Ortiz, A., & García-Ruiz, J. M. (2017). Evidence of glacial activity during the Oldest Dryas in the mountains of Spain. Geological Society, London, Special Publications , 433 (1), 87–110.
Papadopoulou, A., Anastasiou, I., & Vogler, A. P. (2010). Revisiting the Insect Mitochondrial Molecular Clock: The Mid-Aegean Trench Calibration. Molecular Biology and Evolution , 27 (7), 1659–1672. doi: 10.1093/molbev/msq051
Paradis, E., & Schliep, K. (2019). ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics ,35 (3), 526–528. doi: 10.1093/bioinformatics/bty633
Payseur, B. A., Presgraves, D. C., & Filatov, D. A. (2018). Sex chromosomes and speciation. Molecular Ecology , 27 (19), 3745.
Payseur, B. A., & Rieseberg, L. H. (2016). A genomic perspective on hybridization and speciation. Molecular Ecology , 25 (11), 2337–2360. doi: https://doi.org/10.1111/mec.13557
Pereira, R. J., Ruiz-Ruano, F. J., Thomas, C. J. E., Pérez-Ruiz, M., Jiménez-Bartolomé, M., Liu, S., … Bella, J. L. (2021). Mind the numt: Finding informative mitochondrial markers in a giant grasshopper genome. Journal of Zoological Systematics and Evolutionary Research , 59 (3), 635–645. doi: 10.1111/jzs.12446
Poelstra, J. W., Vijay, N., Bossu, C. M., Lantz, H., Ryll, B., Müller, I., … Wolf, J. B. W. (2014). The genomic landscape underlying phenotypic integrity in the face of gene flow in crows. Science ,344 (6190), 1410–1414. doi: 10.1126/science.1253226
Pollard, D. A., Iyer, V. N., Moses, A. M., & Eisen, M. B. (2006). Widespread Discordance of Gene Trees with Species Tree in Drosophila: Evidence for Incomplete Lineage Sorting. PLoS Genetics ,2 (10), e173. doi: 10.1371/journal.pgen.0020173
Presgraves, D. C. (2010). The molecular evolutionary basis of species formation. Nature Reviews Genetics , 11 (3), 175–180. doi: 10.1038/nrg2718
Presgraves, D. C. (2018). Evaluating genomic signatures of “the large X-effect” during complex speciation. Molecular Ecology ,27 (19), 3822–3830. doi: 10.1111/mec.14777
Presgraves, D. C., & Meiklejohn, C. D. (2021). Hybrid Sterility, Genetic Conflict and Complex Speciation: Lessons From the Drosophila simulans Clade Species. Frontiers in Genetics , 12 , 669045. doi: 10.3389/fgene.2021.669045
Rafati, N., Blanco-Aguiar, J. A., Rubin, C. J., Sayyab, S., Sabatino, S. J., Afonso, S., … Carneiro, M. (2018). A genomic map of clinal variation across the European rabbit hybrid zone. Molecular Ecology , 27 (6), 1457–1478. doi: 10.1111/mec.14494
Rambaut, A., Drummond, A. J., Xie, D., Baele, G., & Suchard, M. A. (2018). Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7. Systematic Biology , 67 (5), 901–904. doi: 10.1093/sysbio/syy032
Ravinet, M., Westram, A., Johannesson, K., Butlin, R. K., André, C., & Panova, M. (2016). Shared and nonshared genomic divergence in parallel ecotypes of Littorina saxatilis at a local scale. Molecular Ecology , 25 (1), 287–305. doi: 10.1111/mec.13332
Reynolds, J., Weir, B. S., & Cockerham, C. C. (1983). Estimation of the Coancestry Coefficient: Basis for a Short-Term Genetic Distance.Genetics , 105 (3), 767–779.
Riesch, R., Muschick, M., Lindtke, D., Villoutreix, R., Comeault, A. A., Farkas, T. E., … Nosil, P. (2017). Transitions between phases of genomic differentiation during stick-insect speciation. Nature Ecology & Evolution , 1 (4), 1–13. doi: 10.1038/s41559-017-0082
Rieseberg, L. H., Baird, S. J., & Gardner, K. A. (2000). Hybridization, introgression, and linkage evolution. Plant Molecular Evolution , 205–224.
Ritchie, M. G., Butlin, R. K., & Hewitt, G. M. (1989). Assortative mating across a hybrid zone in Chorthippus parallelus (Orthoptera: Acrididae). Journal of Evolutionary Biology , 2 (5), 339–352. doi: 10.1046/j.1420-9101.1989.2050339.x
Schilthuizen, M., Giesbers, M., & Beukeboom, L. (2011). Haldane’s rule in the 21st century. Heredity , 107 (2), 95–102.
Schubert, M., Ermini, L., Sarkissian, C. D., Jónsson, H., Ginolhac, A., Schaefer, R., … Orlando, L. (2014). Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. Nature Protocols , 9 (5), 1056–1082. doi: 10.1038/nprot.2014.063
Schubert, M., Lindgreen, S., & Orlando, L. (2016). AdapterRemoval v2: Rapid adapter trimming, identification, and read merging. BMC Research Notes , 9 (1), 88. doi: 10.1186/s13104-016-1900-2
Semenov, G. A., Linck, E., Enbody, E. D., Harris, R. B., Khaydarov, D. R., Alström, P., … Taylor, S. A. (2021). Asymmetric introgression reveals the genetic architecture of a plumage trait. Nature Communications , 12 (1), 1019. doi: 10.1038/s41467-021-21340-y
Serrano, L., Vega, C. G. de la, Bella, J. L., López-Fernández, C., Hewitt, G. M., & Gosálvez, J. (1996). A hybrid zone between two subspecies of Chorthippus parallelus. X-chromosome variation through a contact zone. Journal of Evolutionary Biology , 9 (2), 173–184. doi: https://doi.org/10.1046/j.1420-9101.1996.9020173.x
Shuker, D. M., Underwood, K., King, T. M., & Butlin, R. K. (2005). Patterns of male sterility in a grasshopper hybrid zone imply accumulation of hybrid incompatibilities without selection.Proceedings of the Royal Society B: Biological Sciences ,272 (1580), 2491–2497.
Skotte, L., Korneliussen, T. S., & Albrechtsen, A. (2013). Estimating Individual Admixture Proportions from Next Generation Sequencing Data.Genetics , 195 (3), 693–702. doi: 10.1534/genetics.113.154138
Slatkin, M. (1985). Gene Flow in Natural Populations. Annual Review of Ecology and Systematics , 16 (1), 393–430. doi: 10.1146/annurev.es.16.110185.002141
Song, H., Mariño-Pérez, R., Woller, D. A., & Cigliano, M. M. (2018). Evolution, Diversification, and Biogeography of Grasshoppers (Orthoptera: Acrididae). Insect Systematics and Diversity ,2 (4), 3. doi: 10.1093/isd/ixy008
Soraggi, S., Wiuf, C., & Albrechtsen, A. (2018). Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data. G3: Genes, Genomes, Genetics , 8 (2), 551–566. doi: 10.1534/g3.117.300192
Stehlik, I., Blattner, F., Holderegger, R., & Bachmann, K. (2002). Nunatak survival of the high Alpine plant Eritrichium nanum (L.) Gaudin in the central Alps during the ice ages. Molecular Ecology ,11 (10), 2027–2036.
Stern, D. L., & Orgogozo, V. (2009). Is genetic evolution predictable?Science , 323 (5915), 746–751.
Strimmer, K., & Haeseler, A. von. (1997). Likelihood-mapping: A simple method to visualize phylogenetic content of a sequence alignment.Proceedings of the National Academy of Sciences , 94 (13), 6815–6819. doi: 10.1073/pnas.94.13.6815
Suchard, M. A., Lemey, P., Baele, G., Ayres, D. L., Drummond, A. J., & Rambaut, A. (2018). Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evolution , 4 (1). doi: 10.1093/ve/vey016
Tajima, F. (1989). Statistical Method for Testing the Neutral Mutation Hypothesis by DNA Polymorphism. Genetics , 123 (3), 585–595.
Tamura, K., Dudley, J., Nei, M., & Kumar, S. (2007). MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) Software Version 4.0.Molecular Biology and Evolution , 24 (8), 1596–1599. doi: 10.1093/molbev/msm092
Teeter, K. C., Payseur, B. A., Harris, L. W., Bakewell, M. A., Thibodeau, L. M., O’Brien, J. E., … Tucker, P. K. (2007). Genome-wide patterns of gene flow across a house mouse hybrid zone.Genome Research , 18 (1), 67–76. doi: 10.1101/gr.6757907
Turner, L. M., & Harr, B. (2014). Genome-wide mapping in a house mouse hybrid zone reveals hybrid sterility loci and Dobzhansky-Muller interactions. ELife , 3 , e02504. doi: 10.7554/eLife.02504
Virdee, S. R., & Hewitt, G. M. (1992). Postzygotic isolation and Haldane’s rule in a grasshopper. Heredity , 69 (6), 527–538.
Virdee, S. R., & Hewitt, G. M. (1994). Clines for hybrid dysfunction in a grasshopper hybrid zone. Evolution , 48 (2), 392–407. doi: 10.1111/j.1558-5646.1994.tb01319.x
Wachter, G. A., Arthofer, W., Dejaco, T., Rinnhofer, L. J., Steiner, F. M., & Schlick‐Steiner, B. C. (2012). Pleistocene survival on central Alpine nunataks: Genetic evidence from the jumping bristletail Machilis pallida. Molecular Ecology , 21 (20), 4983–4995.
Wang, R. L., & Hey, J. (1996). The Speciation History of Drosophila Pseudoobscura and Close Relatives: Inferences from DNA Sequence Variation at the Period Locus. Genetics , 144 (3), 1113–1126.
Watterson, G. A. (1975). On the number of segregating sites in genetical models without recombination. Theoretical Population Biology ,7 (2), 256–276. doi: 10.1016/0040-5809(75)90020-9
Westram, A. M., Faria, R., Johannesson, K., & Butlin, R. K. (2021). Using replicate hybrid zones to understand the genomic basis of adaptive divergence. Molecular Ecology , 30 (15), 3797–3814. doi: 10.1111/mec.15861
Zabal-Aguirre, M., Arroyo, F., & Bella, J. (2010). Distribution of Wolbachia infection in Chorthippus parallelus populations within and beyond a Pyrenean hybrid zone. Heredity , 104 (2), 174–184.
Zieliński, P., Dudek, K., Arntzen, J. W., Palomar, G., Niedzicka, M., Fijarczyk, A., … Babik, W. (2019). Differential introgression across newt hybrid zones: Evidence from replicated transects.Molecular Ecology , 28 (21), 4811–4824. doi: https://doi.org/10.1111/mec.15251

7. DATA ACCESSIBILITY

Raw reads were deposited in the NCBI Sequence Read Archive (BioProject XXX; BioSample Acc. XXX). The dryad archive (doi: XXX) contains: per gene estimates of FST, dXY, pi, theta and Tajima’s D; the mitochondrial and nuclear alignments for the phylogenetic analyses, and the infiles for the admixture and demographic analyses. Bioinformatic scripts are available in a public GitHub repository https://github.com/lindington/ChorthPar.

8. AUTHOR CONTRIBUTIONS

RJP conceived the project, cofounded by RJP and TM. RJP, OH and JLB performed the sampling and RJP produced the data. RJP, LH and EC analysed the data and interpreted the results. RJP, LH, and EC wrote the first version of the manuscript, and all authors contributed to the final version.

9. FIGURE CAPTIONS

Figure 1: Divergence of Pseudochorthippus parallelus.A. Sampling localities (dots) are coloured by their presumed evolutionary lineage according to subspecies and chromosomal race. Sampling localities in Eastern Central Europe (grey) and in the hybrid zones (black) cannot be attributed a priori to any lineage.B. Principal component analysis based on 1,494,335 polymorphisms separates subspecies in PC1 and the chromosomal races ofP. p. parallelus in PC2.
Figure 2: Divergence and secondary contact inPseudochorthippus parallelus. A. Population species tree based on 5,929 nuclear genes. Black dots on branches represent posterior probabilities higher than 95. Populations are coloured according to Fig. 1. Branch lengths are in coalescent units of 𝑇/𝑁𝐸.B. Admixture analyses showing three levels of populations structure: subspecies level (K=3), chromosomal races (K=4) and population level (K=10).
Figure 3: Time-calibrated tree based on 13 mitochondrial genes. Evolutionary lineages are coloured in terminal branches following Fig. 1. Branch lengths are scaled in millions of years (mya), and boxes on nodes represent 95% confidence intervals. The most recent common ancestor dates to 0.338 mya, subsequently splitting into six major clades (A-F). Subspecies do not form monophyletic clades, with the exception of subclades A and F, which split around 100 kya (nodes marked as “par-ery”; Table S4; Fig. S2).
Figure 4: Extensive interpopulation gene flow throughoutPseudochorthippus parallelus . The heat map depicts the highest D-statistic per population pair, estimated from 120 possible comparisons. All shaded comparisons were significant (p -value < 0.05); white comparisons were not significant; and gray comparisons cannot be tested (Table S5).
Figure 5: Deviations from demographic neutrality. Violin plots represent the distribution of Tajima’s D calculated for 16,969 genes sampled across the 10 populations of Pseudochorthippus parallelus , and the white dot represents the gene-mean across the transcriptome.
Figure 6: Repeatable patterns of genomic differentiation across two hybrid zones manifesting hybrid male sterility. A. Density of per-gene estimates FST in 11,503 genes samples both in the Pyrenean (red) and in the Alpine (blue) hybrid zone, showing a significant overlap of 117 genes within the 95th percentile (p -value = \(1.1\times 10^{-41}\)). B. In both hybrid zones, per-gene differentiation (FST) is negatively correlated with per-gene divergence (dXY;p -values < \(2.2\times 10^{-16}\)), consistent with a strong contribution of linked selection in the repeatable patterns of genomic differentiation.