Introgression analyses
For ADMIXTURE (Alexander et al. 2009) analyses, sites with
>80% missing data were excluded. The greatest delta (Δ) in
cross-validation (cv) score identified the most appropriate number of
populations (K) by iteratively leaving a sample out and reexamining the
partitioning of genetic structure among the remaining samples.ADMIXTURE results were visualized in R v3.3.4. Populations
identified by ADMIXTURE were used in F- statistics
analyses.
F -statistics were run in AdmixTools (Patterson et
al. 2012) using M. zibellina as an outgroup.F3- statistics [Target: Source-1, Source-2] explicitly test
for admixture (3PopTest ) and considered all permutations where
Source-1, Source-2, and the Target samples came from different
populations. While a significantly negative F3 score (Z
< 5) denotes admixture in the Target sample, a positiveF3 value does not necessarily indicate the absence of admixture
(Peter 2016). To determine the generational status (e.g., F1, F2, B1,
B2, etc.) of each hybrid individual, we use R code available from
Lavretsky et al. (2016) to simulate multi-generational hybrids
based on unadmixed M. americana (‘POP1’) and M. caurina(‘POP2’) parental populations. We contrasted admixture proportions of
empirical hybrids against proportions output for simulated
multi-generational hybrids to estimate generational status. Last,Treemix (Pickrell & Pritchard 2012) was used to infer historical
relationships among populations with 2, 3 or 4 mixture events.
To characterize the backcrossing history of each hybrid sample, we usedF4- statistics, similar to D-statistics or ABBA/BABA (Kulathinalet al. 2009; Green et al. 2010; Durand et al.2011), in AdmixTools with block-jackknifing accommodating
non-independence between loci. Although F -statistics alone cannot
deduce the direction of gene flow in a system, admixture graph fitting
can test whether a proposed evolutionary model fits the data well
(Lipson et al. 2013; Martin et al. 2015).AdmixtureGraph (Mailund et al. 2016) in R iteratively fit
hybrid individuals into two non-admixed tree topologies
(maximum-likelihood topology and the same topology collapsed into K = 6
populations; Supplemental Information 2-3) by estimating the minimal
error placement from F4 results. We tested all population
permutations excluding hybrids identified through F3- statistics.
We then tested hybrids against individuals from ‘pure’ populations
(e.g., (Outgroup, Hybrid; continental americana , insularcaurina )) to decipher the backcrossing histories of hybrid
samples and characterize patterns of gene flow across populations.F4 -statistics [W, X ; Y, Z] are negative (Z-score ≤ -5) if
there is more allelic overlap between X and Y than between X and Z, and
positive (Z score ≥ 5) if there has been more recent allele sharing
between X and Z than between X and Y. To estimate the timing of
introgression, we converted drift unit branch lengths (D ) output
from MixMapper to absolute time (years) using the formula D ≈ 1-
e-t/2Ne (solved for t generations; Lipsonet al. 2013; Puckett et al. 2015) and a generation time of
5 years (Buskirk et al. 2012). Small sample sizes and the absence
of a Martes linkage map prevents linkage disequilibrium-based
estimates of Ne and more refined dating of admixture
events.