Marker Performance
Counts of Mendelian incompatibilities (MI) between true parent-offspring trios, while generally not zero as a result of genotyping error, were distinctly less than those in comparisons including one or two non-parents. Similarly, the distributions of LOD for correct and incorrect single parent assignments were distinct regardless of whether both, one, or neither of the parents were included in the candidate set (Figure 2). Estimates of relatedness among offspring from these crosses were very near theoretical expectations, although slightly downwardly biased for full- and half-sibling relationships (Figure 3). However, the ranges of both sibling types and unrelated individuals overlapped, making relationship estimation from this relatedness measure informative but imprecise.
Relationships estimated by Colony2 were accurate to a degree, but not always comprehensive. When predicted genotyping error was relatively high (0.05), the number of full sibling families and dyad full sibships was estimated accurately; dyad half sibships, albeit incomplete (99.9%), was also at its highest, and the number of contributing parents (Ns) was estimated correctly. However, the probability of sibship was undesirably low for both full (mean 0.51; range 0.224-0.511) and half (0.31; 0.001-0.489) sibships, which may nonetheless result from utilizing a pseudo-dominant data format. As allowed genotyping error decreased (0.01 to 0.001), the number of full sibling families increased (29 to 36), with commensurate decreases in exclusion probability, and some full siblings were assigned as half siblings (1-3%), though their dyad probability of sibship was still more similar to full than half siblings (0.43; 0.214-0.489). With increased stringency in genotyping error, completeness of dyad half sibships also declined from 99% to 89%. Not surprisingly, the offspring segregated into new families with parents inferred to be absent from the input set tended to have moderately higher rates of MI with true parents. For example, the mean percent MI of accurately assigned offspring to an included male was 1.4% MI (range 0 to 3.3%), while the mean of his offspring assigned to a male inferred by the program was 2.4% (range 0.04 to 5.5%) (Supplemental Figure 3). Reducing the loci utilized to only those 230 loci with read ratio 1 or 2 scores did not improve results; in fact, the completeness of half-sibships allowing 5% genotyping error declined slightly (from 99.9% to 99.3%), and one sample pair, inadvertent replicates (clones), was identified as a separate full-sibling family even after being identified as clones by the program, suggesting that despite some noise, these loci provide important discriminatory power for identifying relationships.
Providing parents as separate sexes or as an ordered list (sires then dams) did not affect outcome from Colony2. Interestingly, however, when a single, unordered set of parents was provided as both potential sires and dams, results became erratic between single runs, with additional inferred (hypothetical) parents, reversed genders, and extra full sibling families, but no inaccurate assignments. Across 3+ combined runs, though, parents in an unordered list were assigned correctly with the exception that inferred gender of dams and sires as a group was occasionally reversed, and notwithstanding the aforementioned effects of genotyping error rates on full sibling families and sibship. We thus recommend multiple combined runs be made to ensure accurate results. Moreover, providing parents in any form appeared to result in improved identification, with increased sibship completeness for concomitant rates of genotyping error (e.g. 0.01 error: 99.3% with parents mixed; 92.8% with no parents). Importantly, however, in none of these analyses were unrelated individuals ever identified as siblings, and the estimated number of contributing parents (Ns) was, excluding the clone family, always correct (11). In addition, it is worth noting that these data consist of many related individuals with only a minimum of 80% genotype completeness, making the estimation of global allele frequencies, and thus relationship probability, more challenging than datasets with fewer offspring per family (Colony2 manual). Although we did not explore it, increasing the prior for sibship size (default of 1), specifying allele frequencies estimated from a less related set, and achieving greater genotype completeness, may improve the precision of relationships estimated for polyploid organisms with pseudodominant data in this program.
The inference of parentage, sibship, and relatedness are active areas in sturgeon conservation because many of the conservation management plans of the most recruitment limited populations call for supplementation through hatchery spawning and/or rearing (Hildebrand et al., 2016). While these plans exhibit great potential, they must be done with care because of the potential for genetic swamping of the wild population by alleles from just a few breeding individuals (Thorstensen, Bates, Lepla, & Schreier, 2019). This has been recognized for some time, however, and most ex situ spawning programs address this where possible by making factorial crosses of wild parents that are only spawned in a single brood year (Jager, 2005). Variance in survival across families can undermine these factorial and normalized supplementation designs, decreasing the genetic diversity reintroduced by hatchery offspring. For example, using parentage alongside PIT tag recordings, Schreier et al. (A. Schreier, Stephenson, Rust, & Young, 2015) found that several year classes of offspring that were surviving after 3 years did not reconstitute the genetic diversity of the brood stock, not to mention the adult population at large.
These observations have reinforced the push to monitor both the variability in recruitment success and long-term genetic effects of hatchery supplementation, objectives that depend on determining the relationship or number of contributing spawners in supplemented and/or wild fish. Because the number of broodstock in most locations will not themselves be an adequate representation of overall population genetic variation, programs that collect naturally produced eggs and larvae for hatchery rearing followed by repatriation as juveniles may capture the offspring of more spawning adults and therefore better represent standing genetic diversity (Thorstensen et al., 2019). While promising, repatriation techniques are only effective in situations where recruitment limitation results from survivorship in life stages promoted by tenure in the hatchery, spawning sites and times can be identified effectively and the number of adults spawning there exceeds broodstock constraints of nearby hatcheries, and survivorship variation among families is stochastic or reflects natural patterns (e.g. maternal health). For example, using sibship to estimate the number of spawners, Jay et al. (2014) identified strong variation in the number of spawners among spawning locations and dates, meaning repatriation programs would be well served to collect in multiple sites and times. Nonetheless, these authors estimated numbers of contributing spawners that would be difficult to reproduce with practical limitations on hatchery broodstock numbers (see also Blankenship, Schumer, Van Eenennaam, & Jackson, 2017). In any event, supplementation and repatriation programs operate on the assumption that relatedness in stocked offspring does not diminish genetic diversity or promote inbreeding depression in small populations, a conjecture that is more easily tested using the markers and techniques we have demonstrated here.
Supplementation and repatriation programs also presume that fitness of stocked offspring (i.e. fecundity and survival of their progeny) is similar to in situ individuals, although it is as yet unclear how variation in ploidy in white sturgeon, which may be exacerbated by human intervention, affects this parameter (J. P. Van Eenennaam et al., 2019). Our pipeline, thanks to integration of funkyPloid , allows simultaneous ploidy estimation and ploidy-aware genotyping. However, a minimum coverage of at least 100k reads is recommended to accurately score heterozygous genotypes and inform ploidy estimation. Confidence in ploidy estimates (minimum alternate LLR) is correlated with sequencing depth (R2 = 0.64, p<0.001, Figure 4a). Minimum alternate LLR appears to be more closely tied to genotype completeness (R2 = 0.73, p<0.001) than to heterozygosity (R2 = 0.56, p<0.001), although both factors are influential (Figure 4a, Supplemental Figure 4). Similarly, genotype completeness appears to be more closely correlated to sequence coverage (R2 = 0.51, p<0.001) than is heterozygosity (R2 = 0.39, p<0.001), and together these observations indicate that multiplexing protocols should be optimized for genotyping completeness, which indirectly provides more accurate estimates of heterozygosity, in order to bolster confidence in ploidy estimates. For our samples, genotyping at 90% completeness generally required a minimum of ~100k on-target reads per sample (Figure 4b), or on average ~300 reads per marker for each individual (5 and 10 percentiles of samples >90% complete were 96.4k and 119.2k reads, respectively). In addition to more accurate estimates of ploidy, by comparing 142 samples genotyped twice or more we observed that after achieving ≥90% completeness, mean genotyping error (incorrect number of genotypes/number of typed loci, excluding missing data) was no more than 1.1%.
For several years, it has been known that individuals with nuclear DNA content indicative of dodecaploidy (12N), or sometimes 16N, were present in white sturgeon hatchery populations (Drauch Schreier et al., 2011; A. D. Schreier et al., 2013). Similar variants were also reported in Siberian sturgeon culture (Acipsenser baerii; Havelka, Bytyutskyy, Symonová, Ráb, & Flajšhans, 2016). It appears that this process likely results from retention of the second polar body after fertilization (Gille, Famula, May, & Schreier, 2015), and may be promoted by handling for hatchery spawning or rearing (J. P. Van Eenennaam et al., 2019). Autopolyploid white sturgeon have not been observed to show diminished survivorship or fertility, and their backcross offspring, which most often show the expected ploidy (10N), are often viable (Drauch Schreier et al., 2011; Gille et al., 2015; Leal, Clark, Van Eenennaam, Schreier, & Todgham, 2018; Leal, Van Eenennaam, Schreier, & Todgham, 2020). This has raised several additional important questions regarding the fertility of these backcross fish and the ploidy of gametes and offspring produced. Although 12N fish appear to suffer no immediate fitness loss, and indeed in some autopolyploid sturgeon exhibit increased vigor (Beyea, Benfey, & Kieffer, 2005), it seems likely that these 10N (and likely pentasomic) suffer reduced fertility, and their aneuploid offspring, if viable, reduced vigor and fertility (J. P. Van Eenennaam et al., 2019). If so, this presents a problem for conservation aquaculture programs, if hatchery spawning techniques increase the incidence of autopolyploids, and there is no indication that repatriation programs are immune to this phenomenon either. The rate of natural ploidy variation in white sturgeon, which would be the standard to which to compare, is unclear and an active area of research. One constraint to address this, however, is that current methods for ploidy estimation rely on fresh tissue samples (Fiske et al., 2019), generally precluding the use of archived tissues. The pipeline presented here, however, provides for simultaneous genotyping and ploidy estimation using any form of DNA-bearing tissue. Although beyond the scope of this study, we did find and exclude several putative autopolyploid and backcross individuals among the Yakama offspring and in situ samples using this technique (results not shown).
In addition to their utility in determining relationships and estimating ploidy, we expect these SNP markers to be useful for identifying population structure and dispersal between population segments (Ogden et al., 2013; Roques, Chancerel, Boury, Pierre, & Acolas, 2019). Although these SNPs were initially identified as those with a minor allele frequency (MAF) >0.05 in the ascertainment panel, several of them exhibited a mean MAF in our filtered in situ samples below this value (Figure 5). However, most of these low-mean-MAF SNPs exhibited variation in MAF among localities that should make them useful for discriminating different populations. Notably, many loci exhibited as much variation in MAF among reaches within the Columbia basin as between sites in the Columbia and those in the Fraser and Sacramento River basins.
Of all the comparisons of linkage disequilibrium between loci in populations with sufficient sample size, only 0.12% and 0.25% of comparisons were significant at FDR of 0.05 and 0.1, respectively. Among those significant comparisons, there did not appear to be a relationship between frequency of linkage association and allele ratio score (Figure 6a). The top five loci most involved in significant associations were Atr_72251-33 (1.19%), Atr_14917-56 (1.09%), Atr_36485-28 (0.99%), Atr_40343-66 (0.99%), and Atr_65359-46 (0.99%). In contrast, 34% of loci significantly deviated from HWE at an FDR of 0.05. However, there was a moderate and significant correlation (R2 = 0.47; p<0.001) between the number of populations in which a locus was out of HW equilibrium and the allele ratio score, indicating that deviation from HWE was likely exacerbated by genotyping inaccuracy at a locus (Figure 6b). In addition, it is possible that filtering for related individuals in this dataset, either too strongly or too weakly, could also affect the incidence of significant HW tests.
All of the reaches with sufficient sample size exhibited median individual estimates of inbreeding that were above zero, although the median value and ranges of the estimates varied by reach, suggesting that these SNP loci will be useful for investigating trends of recruitment, population viability, and potential for inbreeding depression (Supplemental Figure 5). As observed previously (A. Drauch Schreier, Mahardja, & May, 2013), sub-populations in the Columbia basin appear to exhibit an isolation by distance pattern in which the fishes in the uppermost reaches (upper Snake, upper Columbia) exhibit the strongest divergence (Table 2). Notably, the precision on these estimates of FST was more strongly affected by sample size of individuals than by variation introduced by sampling different subsets of loci.
One of the challenges for supplementation programs in the most severely diminished population segments is obtaining enough unrelated brood stock from in situ populations so as to not further reduce standing genetic variation by swamping. In these cases, the use of translocated adults or hatchery reared young from other breeding stocks has been suggested but remains controversial because of the uncertainty of population structure among population segments (Hildebrand et al., 2016). While adult white sturgeon, before the creation of the hydropower barriers, likely would have been able to migrate throughout much of the Columbia Basin, fish in less impounded river systems (e.g. Fraser River) appear to show moderate site fidelity, an observation reinforced by population genetic structure (Andrea Drauch Schreier, Mahardja, & May, 2012). Thus, even where movement patterns were not historically restricted for feeding or overwintering in Columbia River sturgeon, there may have been cryptic barriers or spawning site fidelity that reduced gene flow over longer distances (A. Drauch Schreier et al., 2013). It remains to be seen whether there is detectable population structure over shorter distances, or whether the detected entrainment of young fish has been sufficient to reduce population divergence following dam construction. Moreover, where translocation of adults or young is implemented, it will need to be closely monitored in studies for outbreeding depression, standing genetic variation, and potential local adaptation, which will be aided by the efficient genotyping of genetic markers described herein.