Testing refugia models with TMRCA of paired populations
The inferred ancestral clusters from sNMF and conStruct were further
tested as refugial populations by comparing the divergence times of
paired populations. Under a drainage refugia hypothesis, we expect to
find older divergence times between populations from different drainages
systems than within drainage systems, as the two populations retreated
to separate glacial refugia and were isolated for a longer period of
time. For the preferred ancestral clustering model (K=6), we assigned
populations to major drainage basins in the Sierra Nevada based on the
locations of their collection sites (Table 2 ). If populations
had nearly equal distance from two different drainages, they were
assigned to both drainages. As several alternative ancestral clustering
models (K=4 and K=5) from sNMF were difficult to distinguish from K=6,
we examined the effect of this analysis under these alternative
clustering models (Figure S1 , Table 1 ). We grouped
populations from Tuolumne and Merced drainages into the same refugium
(K=5), and further grouped populations from the Kern and Kaweah
drainages into single refugium (K=4). Finally, we considered a three
refugia model, where populations were grouped within the same
morphotype.
Paired populations were chosen by grouping collecting sites with their
nearest four to seven geographical neighbors, including sites from the
same and/or different drainages or morphotypes. The time of most recent
common ancestor (TMRCA) of the paired populations was estimated using
the joint site frequency spectrum in δaδi version 2.0.3 (Gutenkunst,
Hernandez, Williamson, & Bustamante, 2009). Since there is a potential
bias in using filtered genotype data (Warmuth and Ellegren 2019), we
estimated the two-dimensional allele frequency spectrums (2d-sfs)
directly from the GBS reads. The 2d-sfs estimation from GBS reads
requires a reference genome, so we generated a de novo assembly ofN. ingens from a single male individual using 130x coverage
Illumina NovaSeq pair-end reads. We used SPAdes version 3.14.1
(Bankevich et al., 2012) to assemble the genome (more details regarding
genome assembly are described in the Supplementary Methods). The GBS
reads were mapped to the reference genome using STAR version 2.7.3a and
used to calculate the optimized 2d-sfs using angsd version 0.931, after
filtering out the low quality reads using default quality control
settings (Dobin et al., 2013, Korneliussen, Albrechtsen, & Nielsen,
2014). The TMRCA, as well as the migration rate and effective population
size of each population, were approximated with the 2d-sfs using
dadi_pipeline version 3.1.4 (Gutenkunst et al., 2009, Portik et al.,
2017). In the dadi_pipeline program, we selected the divergence with
continuous symmetric migration model and ran this 10 times in a 4-run
(10, 20, 30, 40 replications) or 5-run (10, 20, 30, 40, 50 replications)
optimization, with grid size starting from the number that was rounded
up from the larger allele size to the nearest ten, until the likelihoods
and optimized parameters reached a stable state. The parameters from the
optimization with highest likelihood score were selected to represent
the best optimization. The standard deviations of each set of parameters
was estimated using 100 nonparametric bootstraps of the 2d-sfs generated
with δaδi, and then the Godambe Information Matrix (GIM) was used to
analyze the uncertainty of the parameters based on the 100 bootstrap
2d-sfs (Coffman, Hsieh, Gravel, & Gutenkunst, 2016).
The TMRCA estimations of paired populations were used as dependent
variable, with grouping status (from the same or different drainage,
cluster, or morphotype) used as a categorical independent variable and
geographical distance as a covariate in an analysis of covariance
(ANCOVA). For each ANCOVA regression model, AIC and BIC were calculated,
and Vuong’s test was used to compare those models using the R package
nonnest2 version 0.5-5 (Vuong, 1989; Merkle & You, 2018).