3.1 | Bioinformatics and population grouping
The total number of raw Illumina sequencing reads for the six plates was 2,983 million, or on average 497 million per plate. The proportion of reads with a correct barcode and restriction enzyme cut site varied from 69% to 83% per plate with an average of 76%. Alignment to theS. nigrocinctus reference genome resulted in 79% overall alignment rate, with the percentage of aligned reads per sample ranging from 56 to 77% (mean = 71%). Filtering of individuals with high percentages of missing genotypes (> 15%) and SNPs with low genotyping rates (< 20%) resulted in the final sample size of 398 individual fish (321 in 2014 and 77 in 2015) and 11,146 SNPs.
The ancestry analysis revealed the presence of 4 discrete spawning populations. sNMF ancestry analysis in LEA revealed 4 populations based on cross-entropy criteria (Figure 2a-2b). PCA analysis supported the K=4 sNMF derived putative population clusters (Figure 2c). STRUCTURE (Pritchard et al. 2000) analysis also supported K=4 populations, but with greater admixture of population 2 and 3 than was estimated via sNMF algorithm.All four of these populations were represented in both the 2014 and 2015 collections (Figure 3). PairwiseF ST values for genetic differentiation among putative population-year combinations revealed consistent differentiation between populations in each year (Table 1). Additionally, this difference was conserved across years, meaning little differentiation as measured by F ST was observed within a population, between years. These findings support the results of the ancestry analysis and provide evidence that the 2014 and 2015 collections are composed of similar mixtures of discrete spawning populations.
Relatedness analysis showed no related individuals (up to half siblings) in the collections. This indicates that the discrete sNMF derived populations are not simply groups of closely related individuals. Furthermore, the results of this analysis ensured that no related individuals are included in the subsequent genotype-environment association models, which is thought to result in higher false positive rates due to lack of independence among the samples (Newman et al. 2001; Voight and Pritchard 2005).
Fewer private alleles were detected in 2015 than in 2014 and this pattern was significant when adjusting for the smaller sample size in 2015 (Table 2). This analysis was done separately for each sNMF-derived population, and we detected private alleles in common among all four populations (Table 3) indicating the same suite of alleles was not detected in 2015. This suggests that 2015 selection was stronger as compared to 2014, leading to loss of deleterious alleles in the 2015 cohort, which is consistent with the more abnormal oceanic conditions observed in 2015 than in 2014 (Cavole et al. 2016; Gentemann et al. 2017; Jones et al. 2018).