Marker Performance
Counts of Mendelian incompatibilities (MI) between true parent-offspring
trios, while generally not zero as a result of genotyping error, were
distinctly less than those in comparisons including one or two
non-parents. Similarly, the distributions of LOD for correct and
incorrect single parent assignments were distinct regardless of whether
both, one, or neither of the parents were included in the candidate set
(Figure 2). Estimates of relatedness among offspring from these crosses
were very near theoretical expectations, although slightly downwardly
biased for full- and half-sibling relationships (Figure 3). However, the
ranges of both sibling types and unrelated individuals overlapped,
making relationship estimation from this relatedness measure informative
but imprecise.
Relationships estimated by Colony2 were accurate to a degree,
but not always comprehensive. When predicted genotyping error was
relatively high (0.05), the number of full sibling families and dyad
full sibships was estimated accurately; dyad half sibships, albeit
incomplete (99.9%), was also at its highest, and the number of
contributing parents (Ns) was estimated correctly.
However, the probability of sibship was undesirably low for both full
(mean 0.51; range 0.224-0.511) and half (0.31; 0.001-0.489) sibships,
which may nonetheless result from utilizing a pseudo-dominant data
format. As allowed genotyping error decreased (0.01 to 0.001), the
number of full sibling families increased (29 to 36), with commensurate
decreases in exclusion probability, and some full siblings were assigned
as half siblings (1-3%), though their dyad probability of sibship was
still more similar to full than half siblings (0.43; 0.214-0.489). With
increased stringency in genotyping error, completeness of dyad half
sibships also declined from 99% to 89%. Not surprisingly, the
offspring segregated into new families with parents inferred to be
absent from the input set tended to have moderately higher rates of MI
with true parents. For example, the mean percent MI of accurately
assigned offspring to an included male was 1.4% MI (range 0 to 3.3%),
while the mean of his offspring assigned to a male inferred by the
program was 2.4% (range 0.04 to 5.5%) (Supplemental Figure 3).
Reducing the loci utilized to only those 230 loci with read ratio 1 or 2
scores did not improve results; in fact, the completeness of
half-sibships allowing 5% genotyping error declined slightly (from
99.9% to 99.3%), and one sample pair, inadvertent replicates (clones),
was identified as a separate full-sibling family even after being
identified as clones by the program, suggesting that despite some noise,
these loci provide important discriminatory power for identifying
relationships.
Providing parents as separate sexes or as an ordered list (sires then
dams) did not affect outcome from Colony2. Interestingly,
however, when a single, unordered set of parents was provided as both
potential sires and dams, results became erratic between single runs,
with additional inferred (hypothetical) parents, reversed genders, and
extra full sibling families, but no inaccurate assignments. Across 3+
combined runs, though, parents in an unordered list were assigned
correctly with the exception that inferred gender of dams and sires as a
group was occasionally reversed, and notwithstanding the aforementioned
effects of genotyping error rates on full sibling families and sibship.
We thus recommend multiple combined runs be made to ensure accurate
results. Moreover, providing parents in any form appeared to result in
improved identification, with increased sibship completeness for
concomitant rates of genotyping error (e.g. 0.01 error: 99.3% with
parents mixed; 92.8% with no parents). Importantly, however, in none of
these analyses were unrelated individuals ever identified as siblings,
and the estimated number of contributing parents (Ns)
was, excluding the clone family, always correct (11). In addition, it is
worth noting that these data consist of many related individuals with
only a minimum of 80% genotype completeness, making the estimation of
global allele frequencies, and thus relationship probability, more
challenging than datasets with fewer offspring per family
(Colony2 manual). Although we did not explore it, increasing
the prior for sibship size (default of 1), specifying allele frequencies
estimated from a less related set, and achieving greater genotype
completeness, may improve the precision of relationships estimated for
polyploid organisms with pseudodominant data in this program.
The inference of parentage, sibship, and relatedness are active areas in
sturgeon conservation because many of the conservation management plans
of the most recruitment limited populations call for supplementation
through hatchery spawning and/or rearing (Hildebrand et al., 2016).
While these plans exhibit great potential, they must be done with care
because of the potential for genetic swamping of the wild population by
alleles from just a few breeding individuals (Thorstensen, Bates, Lepla,
& Schreier, 2019). This has been recognized for some time, however, and
most ex situ spawning programs address this where possible by
making factorial crosses of wild parents that are only spawned in a
single brood year (Jager, 2005). Variance in survival across families
can undermine these factorial and normalized supplementation designs,
decreasing the genetic diversity reintroduced by hatchery offspring. For
example, using parentage alongside PIT tag recordings, Schreier et al.
(A. Schreier, Stephenson, Rust, & Young, 2015) found that several year
classes of offspring that were surviving after 3 years did not
reconstitute the genetic diversity of the brood stock, not to mention
the adult population at large.
These observations have reinforced the push to monitor both the
variability in recruitment success and long-term genetic effects of
hatchery supplementation, objectives that depend on determining the
relationship or number of contributing spawners in supplemented and/or
wild fish. Because the number of broodstock in most locations will not
themselves be an adequate representation of overall population genetic
variation, programs that collect naturally produced eggs and larvae for
hatchery rearing followed by repatriation as juveniles may capture the
offspring of more spawning adults and therefore better represent
standing genetic diversity (Thorstensen et al., 2019). While promising,
repatriation techniques are only effective in situations where
recruitment limitation results from survivorship in life stages promoted
by tenure in the hatchery, spawning sites and times can be identified
effectively and the number of adults spawning there exceeds broodstock
constraints of nearby hatcheries, and survivorship variation among
families is stochastic or reflects natural patterns (e.g. maternal
health). For example, using sibship to estimate the number of spawners,
Jay et al. (2014) identified strong variation in the number of spawners
among spawning locations and dates, meaning repatriation programs would
be well served to collect in multiple sites and times. Nonetheless,
these authors estimated numbers of contributing spawners that would be
difficult to reproduce with practical limitations on hatchery broodstock
numbers (see also Blankenship, Schumer, Van Eenennaam, & Jackson,
2017). In any event, supplementation and repatriation programs operate
on the assumption that relatedness in stocked offspring does not
diminish genetic diversity or promote inbreeding depression in small
populations, a conjecture that is more easily tested using the markers
and techniques we have demonstrated here.
Supplementation and repatriation programs also presume that fitness of
stocked offspring (i.e. fecundity and survival of their progeny) is
similar to in situ individuals, although it is as yet unclear how
variation in ploidy in white sturgeon, which may be exacerbated by human
intervention, affects this parameter (J. P. Van Eenennaam et al., 2019).
Our pipeline, thanks to integration of funkyPloid , allows
simultaneous ploidy estimation and ploidy-aware genotyping. However, a
minimum coverage of at least 100k reads is recommended to accurately
score heterozygous genotypes and inform ploidy estimation. Confidence in
ploidy estimates (minimum alternate LLR) is correlated with sequencing
depth (R2 = 0.64, p<0.001, Figure 4a).
Minimum alternate LLR appears to be more closely tied to genotype
completeness (R2 = 0.73, p<0.001) than to
heterozygosity (R2 = 0.56, p<0.001),
although both factors are influential (Figure 4a, Supplemental Figure
4). Similarly, genotype completeness appears to be more closely
correlated to sequence coverage (R2 = 0.51,
p<0.001) than is heterozygosity (R2 = 0.39,
p<0.001), and together these observations indicate that
multiplexing protocols should be optimized for genotyping completeness,
which indirectly provides more accurate estimates of heterozygosity, in
order to bolster confidence in ploidy estimates. For our samples,
genotyping at 90% completeness generally required a minimum of
~100k on-target reads per sample (Figure 4b), or on
average ~300 reads per marker for each individual (5 and
10 percentiles of samples >90% complete were 96.4k and
119.2k reads, respectively). In addition to more accurate estimates of
ploidy, by comparing 142 samples genotyped twice or more we observed
that after achieving ≥90% completeness, mean genotyping error
(incorrect number of genotypes/number of typed loci, excluding missing
data) was no more than 1.1%.
For several years, it has been known that individuals with nuclear DNA
content indicative of dodecaploidy (12N), or sometimes 16N, were present
in white sturgeon hatchery populations (Drauch Schreier et al., 2011; A.
D. Schreier et al., 2013). Similar variants were also reported in
Siberian sturgeon culture (Acipsenser baerii; Havelka,
Bytyutskyy, Symonová, Ráb, & Flajšhans, 2016). It appears that this
process likely results from retention of the second polar body after
fertilization (Gille, Famula, May, & Schreier, 2015), and may be
promoted by handling for hatchery spawning or rearing (J. P. Van
Eenennaam et al., 2019). Autopolyploid white sturgeon have not been
observed to show diminished survivorship or fertility, and their
backcross offspring, which most often show the expected ploidy (10N),
are often viable (Drauch Schreier et al., 2011; Gille et al., 2015;
Leal, Clark, Van Eenennaam, Schreier, & Todgham, 2018; Leal, Van
Eenennaam, Schreier, & Todgham, 2020). This has raised several
additional important questions regarding the fertility of these
backcross fish and the ploidy of gametes and offspring produced.
Although 12N fish appear to suffer no immediate fitness loss, and indeed
in some autopolyploid sturgeon exhibit increased vigor (Beyea, Benfey,
& Kieffer, 2005), it seems likely that these 10N (and likely
pentasomic) suffer reduced fertility, and their aneuploid offspring, if
viable, reduced vigor and fertility (J. P. Van Eenennaam et al., 2019).
If so, this presents a problem for conservation aquaculture programs, if
hatchery spawning techniques increase the incidence of autopolyploids,
and there is no indication that repatriation programs are immune to this
phenomenon either. The rate of natural ploidy variation in white
sturgeon, which would be the standard to which to compare, is unclear
and an active area of research. One constraint to address this, however,
is that current methods for ploidy estimation rely on fresh tissue
samples (Fiske et al., 2019), generally precluding the use of archived
tissues. The pipeline presented here, however, provides for simultaneous
genotyping and ploidy estimation using any form of DNA-bearing tissue.
Although beyond the scope of this study, we did find and exclude several
putative autopolyploid and backcross individuals among the Yakama
offspring and in situ samples using this technique (results not
shown).
In addition to their utility in determining relationships and estimating
ploidy, we expect these SNP markers to be useful for identifying
population structure and dispersal between population segments (Ogden et
al., 2013; Roques, Chancerel, Boury, Pierre, & Acolas, 2019). Although
these SNPs were initially identified as those with a minor allele
frequency (MAF) >0.05 in the ascertainment panel, several
of them exhibited a mean MAF in our filtered in situ samples
below this value (Figure 5). However, most of these low-mean-MAF SNPs
exhibited variation in MAF among localities that should make them useful
for discriminating different populations. Notably, many loci exhibited
as much variation in MAF among reaches within the Columbia basin as
between sites in the Columbia and those in the Fraser and Sacramento
River basins.
Of all the comparisons of linkage disequilibrium between loci in
populations with sufficient sample size, only 0.12% and 0.25% of
comparisons were significant at FDR of 0.05 and 0.1, respectively. Among
those significant comparisons, there did not appear to be a relationship
between frequency of linkage association and allele ratio score (Figure
6a). The top five loci most involved in significant associations were
Atr_72251-33 (1.19%), Atr_14917-56 (1.09%), Atr_36485-28 (0.99%),
Atr_40343-66 (0.99%), and Atr_65359-46 (0.99%). In contrast, 34% of
loci significantly deviated from HWE at an FDR of 0.05. However, there
was a moderate and significant correlation (R2 = 0.47;
p<0.001) between the number of populations in which a locus
was out of HW equilibrium and the allele ratio score, indicating that
deviation from HWE was likely exacerbated by genotyping inaccuracy at a
locus (Figure 6b). In addition, it is possible that filtering for
related individuals in this dataset, either too strongly or too weakly,
could also affect the incidence of significant HW tests.
All of the reaches with sufficient sample size exhibited median
individual estimates of inbreeding that were above zero, although the
median value and ranges of the estimates varied by reach, suggesting
that these SNP loci will be useful for investigating trends of
recruitment, population viability, and potential for inbreeding
depression (Supplemental Figure 5). As observed previously (A. Drauch
Schreier, Mahardja, & May, 2013), sub-populations in the Columbia basin
appear to exhibit an isolation by distance pattern in which the fishes
in the uppermost reaches (upper Snake, upper Columbia) exhibit the
strongest divergence (Table 2). Notably, the precision on these
estimates of FST was more strongly affected by sample
size of individuals than by variation introduced by sampling different
subsets of loci.
One of the challenges for supplementation programs in the most severely
diminished population segments is obtaining enough unrelated brood stock
from in situ populations so as to not further reduce standing
genetic variation by swamping. In these cases, the use of translocated
adults or hatchery reared young from other breeding stocks has been
suggested but remains controversial because of the uncertainty of
population structure among population segments (Hildebrand et al.,
2016). While adult white sturgeon, before the creation of the hydropower
barriers, likely would have been able to migrate throughout much of the
Columbia Basin, fish in less impounded river systems (e.g. Fraser River)
appear to show moderate site fidelity, an observation reinforced by
population genetic structure (Andrea Drauch Schreier, Mahardja, & May,
2012). Thus, even where movement patterns were not historically
restricted for feeding or overwintering in Columbia River sturgeon,
there may have been cryptic barriers or spawning site fidelity that
reduced gene flow over longer distances (A. Drauch Schreier et al.,
2013). It remains to be seen whether there is detectable population
structure over shorter distances, or whether the detected entrainment of
young fish has been sufficient to reduce population divergence following
dam construction. Moreover, where translocation of adults or young is
implemented, it will need to be closely monitored in studies for
outbreeding depression, standing genetic variation, and potential local
adaptation, which will be aided by the efficient genotyping of genetic
markers described herein.