Marker Performance
We evaluated the accuracy of these markers for predicting parentage
using a dataset of 326 offspring from a partial cross of 5 dams and 6
sires from the Yakama hatchery genotyped at a minimum threshold of 80%
completeness (29 full sibling families, ranging from 3 to 23 offspring).
Ploidy-accurate genotypes were generated by the updated GT-seq pipeline
for polyploids. From all potential sire-dam-offspring trios, we
estimated the percent of Mendelian incompatibilities between involving
both vs. one or neither true parent using a custom R script
(Supplemental File 2). For comparisons in which sex is unknown or both
parents may not be included, we used the “Paternity” estimation
routine of Polygene (Kang Huang et al., 2020), which includes
several population genetic routines adapted for polyploids, though some
of these are not applicable across samples of different ploidy. To
evaluate performance of single-parent assignment, we included only 4
dams and 2 sires for 326 of the offspring, and examined the LOD score
when both, one, or neither parent was present in the candidate set. For
circumstances where candidate parents cannot be identified a
priori , or where sibship relationships are of greater interest, we
evaluated the performance of the Huang et al. (2015) maximum likelihood
estimator of relationship against known relationships among the 326
offspring in this Yakama set. Although the presence or degree of meiotic
double-reduction, resulting in gametes that carry both of a pair of
sister chromatids, is not well known in white sturgeon, we applied all
Polygene analyses under the “pure random chromatid
segregation” (PRCS) model, which provides for some amount of
double-reduction. Similarly, we also estimated sibship and full sibling
families using Colony2 (Jones & Wang, 2010). Colony2
is designed for diploids, but accepts dominant data, so each SNP locus
was recoded as two pseudo-dominant loci (Rodzen et al., 2004; Wang &
Scribner, 2014). We ran analyses (full likelihood, high precision, 3
medium length runs) using different estimates of genotyping error (0.001
to 0.05), with parents absent or with all 11 parents present in
different arrangements. Arrangement of parents included: separated by
sex, together in a single set but ordered, and together but unordered,
in each case with probability of inclusion of 0.9, and with all or score
1+2 loci only. Other parameters were left as default.
We also evaluated the utility of these SNPs in estimation of population
genetic parameters useful in understanding the differences among and
relationships between white sturgeon in different population segments.
Using the 3,514 sequenced sturgeon, we evaluated the relationship
between sequencing depth, genotype completeness, heterozygosity, and
confidence in ploidy estimate represented as minimum alternate LLR. We
also compared the genotypes of 142 of these fish which had been
sequenced more than once to estimate genotyping error. We then filtered
to retain individuals with 4N ploidy from minimum alternate LLR
>10 and genotyped at a minimum of 80% completeness. We
utilized only 4N individuals because PolyGene is currently
limited to a single ploidy per population. We removed known stocked and
hatchery individuals from the filtered set, and, to filter unknown
stocked individuals, we excluded all but one individual from any set of
individuals within a reach with relatedness estimates greater than 0.2
(K. Huang et al., 2015), resulting in 1,203 individuals. We reasoned
that unknown stocked fish would exhibit high levels of relatedness due
to the generally limited number of breeders available to hatchery
operations. Based on estimates from known relationships (the Yakama
fish), the applied value should eliminate all full-siblings and most
half-siblings while minimizing the unintentional removal of unrelated
individuals, although the actual results will be population-specific
(Table 1). There has been considerable discussion of whether siblings
should be filtered from population genetic datasets (Wang, 2018; Waples
& Anderson, 2017), and we acknowledge that statistics calculated from
this dataset will have been affected by this filtering but correct for
potentially large bias that can be introduced from non-random sampling
(Wang, 2018).
Using a custom R script (Supplemental File X), we calculated the minor
allele frequency (MAF) of each locus for each reach with
N>7 individuals to examine the information content of these
loci for identifying distinct population segments. For each population
with N>50 individuals, we calculated linkage disequilibrium
among loci using Fisher’s G test and conformance to expectations of
Hardy-Weinberg (HW) equilibrium using Raymond & Rousset’s (1995)
estimator from the Markov chain (5k burn-in, 100 batches of 5k
iterations), both applied in PolyGene and with correction for
multiple tests (false discovery rate, FDR; Benjamini & Hochberg, 1995).
Similarly, we examined differences across sub-populations in estimated
inbreeding values using Weir’s (1997) estimator in PolyGene. Finally, we
estimated genetic divergence, as Nei’s (1973) FSTanalog, among a selection of sub-populations for which sample size was
sufficient, and estimated the precision of this statistic by 100 50%
jackknife replicates across loci and calculation of the 95% confidence
interval from these assuming a normal distribution.