Testing refugia models with TMRCA of paired populations
The inferred ancestral clusters from sNMF and conStruct were further tested as refugial populations by comparing the divergence times of paired populations. Under a drainage refugia hypothesis, we expect to find older divergence times between populations from different drainages systems than within drainage systems, as the two populations retreated to separate glacial refugia and were isolated for a longer period of time. For the preferred ancestral clustering model (K=6), we assigned populations to major drainage basins in the Sierra Nevada based on the locations of their collection sites (Table 2 ). If populations had nearly equal distance from two different drainages, they were assigned to both drainages. As several alternative ancestral clustering models (K=4 and K=5) from sNMF were difficult to distinguish from K=6, we examined the effect of this analysis under these alternative clustering models (Figure S1 , Table 1 ). We grouped populations from Tuolumne and Merced drainages into the same refugium (K=5), and further grouped populations from the Kern and Kaweah drainages into single refugium (K=4). Finally, we considered a three refugia model, where populations were grouped within the same morphotype.
Paired populations were chosen by grouping collecting sites with their nearest four to seven geographical neighbors, including sites from the same and/or different drainages or morphotypes. The time of most recent common ancestor (TMRCA) of the paired populations was estimated using the joint site frequency spectrum in δaδi version 2.0.3 (Gutenkunst, Hernandez, Williamson, & Bustamante, 2009). Since there is a potential bias in using filtered genotype data (Warmuth and Ellegren 2019), we estimated the two-dimensional allele frequency spectrums (2d-sfs) directly from the GBS reads. The 2d-sfs estimation from GBS reads requires a reference genome, so we generated a de novo assembly ofN. ingens from a single male individual using 130x coverage Illumina NovaSeq pair-end reads. We used SPAdes version 3.14.1 (Bankevich et al., 2012) to assemble the genome (more details regarding genome assembly are described in the Supplementary Methods). The GBS reads were mapped to the reference genome using STAR version 2.7.3a and used to calculate the optimized 2d-sfs using angsd version 0.931, after filtering out the low quality reads using default quality control settings (Dobin et al., 2013, Korneliussen, Albrechtsen, & Nielsen, 2014). The TMRCA, as well as the migration rate and effective population size of each population, were approximated with the 2d-sfs using dadi_pipeline version 3.1.4 (Gutenkunst et al., 2009, Portik et al., 2017). In the dadi_pipeline program, we selected the divergence with continuous symmetric migration model and ran this 10 times in a 4-run (10, 20, 30, 40 replications) or 5-run (10, 20, 30, 40, 50 replications) optimization, with grid size starting from the number that was rounded up from the larger allele size to the nearest ten, until the likelihoods and optimized parameters reached a stable state. The parameters from the optimization with highest likelihood score were selected to represent the best optimization. The standard deviations of each set of parameters was estimated using 100 nonparametric bootstraps of the 2d-sfs generated with δaδi, and then the Godambe Information Matrix (GIM) was used to analyze the uncertainty of the parameters based on the 100 bootstrap 2d-sfs (Coffman, Hsieh, Gravel, & Gutenkunst, 2016).
The TMRCA estimations of paired populations were used as dependent variable, with grouping status (from the same or different drainage, cluster, or morphotype) used as a categorical independent variable and geographical distance as a covariate in an analysis of covariance (ANCOVA). For each ANCOVA regression model, AIC and BIC were calculated, and Vuong’s test was used to compare those models using the R package nonnest2 version 0.5-5 (Vuong, 1989; Merkle & You, 2018).