Introgression analyses
For ADMIXTURE (Alexander et al. 2009) analyses, sites with >80% missing data were excluded. The greatest delta (Δ) in cross-validation (cv) score identified the most appropriate number of populations (K) by iteratively leaving a sample out and reexamining the partitioning of genetic structure among the remaining samples.ADMIXTURE results were visualized in R v3.3.4. Populations identified by ADMIXTURE were used in F- statistics analyses.
F -statistics were run in AdmixTools (Patterson et al. 2012) using M. zibellina as an outgroup.F3- statistics [Target: Source-1, Source-2] explicitly test for admixture (3PopTest ) and considered all permutations where Source-1, Source-2, and the Target samples came from different populations. While a significantly negative F3 score (Z < 5) denotes admixture in the Target sample, a positiveF3 value does not necessarily indicate the absence of admixture (Peter 2016). To determine the generational status (e.g., F1, F2, B1, B2, etc.) of each hybrid individual, we use R code available from Lavretsky et al. (2016) to simulate multi-generational hybrids based on unadmixed M. americana (‘POP1’) and M. caurina(‘POP2’) parental populations. We contrasted admixture proportions of empirical hybrids against proportions output for simulated multi-generational hybrids to estimate generational status. Last,Treemix (Pickrell & Pritchard 2012) was used to infer historical relationships among populations with 2, 3 or 4 mixture events.
To characterize the backcrossing history of each hybrid sample, we usedF4- statistics, similar to D-statistics or ABBA/BABA (Kulathinalet al. 2009; Green et al. 2010; Durand et al.2011), in AdmixTools with block-jackknifing accommodating non-independence between loci. Although F -statistics alone cannot deduce the direction of gene flow in a system, admixture graph fitting can test whether a proposed evolutionary model fits the data well (Lipson et al. 2013; Martin et al. 2015).AdmixtureGraph (Mailund et al. 2016) in R iteratively fit hybrid individuals into two non-admixed tree topologies (maximum-likelihood topology and the same topology collapsed into K = 6 populations; Supplemental Information 2-3) by estimating the minimal error placement from F4 results. We tested all population permutations excluding hybrids identified through F3- statistics. We then tested hybrids against individuals from ‘pure’ populations (e.g., (Outgroup, Hybrid; continental americana , insularcaurina )) to decipher the backcrossing histories of hybrid samples and characterize patterns of gene flow across populations.F4 -statistics [W, X ; Y, Z] are negative (Z-score ≤ -5) if there is more allelic overlap between X and Y than between X and Z, and positive (Z score ≥ 5) if there has been more recent allele sharing between X and Z than between X and Y. To estimate the timing of introgression, we converted drift unit branch lengths (D ) output from MixMapper to absolute time (years) using the formula D ≈ 1- e-t/2Ne (solved for t generations; Lipsonet al. 2013; Puckett et al. 2015) and a generation time of 5 years (Buskirk et al. 2012). Small sample sizes and the absence of a Martes linkage map prevents linkage disequilibrium-based estimates of Ne and more refined dating of admixture events.