Disentangling sources of population structure
The population structure of a species reflects both historical divergence processes and contemporary gene flow among subpopulations, and it requires careful analysis to disentangle these evolutionary processes (Nielsen, & Slatkin, 2007; Schönswetter & Schneeweiss 2019). Summary statistics, such as Wright’s F ST, are useful for describing genetic drift and gene flow under equilibrium models, but provide little information about deeper evolutionary events that shape patterns of genetic variation. Clustering based algorithms, which infer individual ancestry components according to a model, are also challenging to interpret when population divergence is continuous in space (François & Durand, 2010; but see Bradburd et al. 2018). Coalescent approaches leverage the distribution of gene genealogies to separately estimate the population parameters, including migration rate and divergence time, and can be used to distinguish models of migration from divergence (Nielsen & Wakeley, 2001). However, coalescent approaches are often difficult to implement when evolutionary history is spatially complex, as the number of free parameters grows rapidly (Beerli, 2009).
In this study, we leveraged aspect of each approach to disentangle the evolutionary history of the N. ingens complex. First, we used the clustering algorithm sNMF to identify population structure, though we found that the log likelihood increased with an increasing number of clusters. A large shift in K occurred as different morphotypes were distinguished genetically (K=3), but models K=4 to K=6 had improved log likelihood scores and revealed further population subdivision reflecting major drainage systems in the Sierra Nevada. More subtle patterns of admixture among drainages continued to emerge as the number of clusters increases (Figure S3 ), but this appears to be the product of gene flow and uneven sampling (continuous population structure). Indeed, by explicitly comparing spatial and non-spatial cluster models using conStruct, we found that the spatial models are generally more accurate than the non-spatial models (Figure 5 ), indicating the presence of isolation by distance among sample sites. Analysis of pairwiseF ST values among populations also strongly supports gradual genetic divergence with geographical distances (R2 = 0.35; Figure 7 ).
We further assessed patterns of population divergence using a model fitting framework based on coalescence theory (Rougemont & Bernatchez, 2018). We compared ANCOVA regressions on the paired populations’ TMRCA for different levels of population structure. For TMRCA, K=6 and K=5 show the best AIC, BIC, and results from the Vuong’s test. Although K=6 and K=5 are indistinguishable owing to the lack of multiple population samples from the Merced drainage, both support the model of drainage associated glacial refugia (Table 1 ).