Marker Evaluation
To examine the functional ploidy (‘somy’, or meiotic segregation pattern) of the identified SNPs in white sturgeon, we initially evaluated read ratios in the ascertainment panel individuals, which exhibited patterns reflective of 5 genotype categories (AAAA, AAAB, AABB, ABBB, BBBB), i.e. tetrasomy. To confirm this, we estimated ploidy from a larger group (N=3,514; Table 1) of white sturgeon from the Columbia, Fraser, and Sacramento River Basins, using read counts for the 325 SNPs and the R function funkyPloid (Delomas et al. submitted), implementing the beta-binomial model with uniform noise for candidate ploidies (somies) of 4N, 5N, and 6N. From this group, putative tetrasomic (4N) individuals were retained that had a minimum of 50k reads and a minimum log likelihood ratio to the next most likely ploidy (hereafter, minimum alternate LLR) of 25, resulting in 2,378 individuals retained for further analyses. We then used the read counts for each locus across these individuals to assess the ploidy of the locus withfunkyPloid comparing 4N and 8N models. This was achieved by transposing the read count matrices input to funkyPloid . The funkyPloid function assesses the number of amplified copies of SNPs in the genome, and does not directly assess the pattern of segregation. Thus, this test cannot discriminate between a true tetrasomic locus and two co-amplified, disomic loci. However, given the chromosome complement and previous observations (e.g. Drauch Schreier et al., 2011), disomic loci are likely to be rare, and so here we only consider tetrasomic and octosomic segregation for these loci. It should also be noted that because this LLR metric includes no penalty for overfitting, ‘noisy’ 4N loci may exhibit higher likelihood for 8N despite only being present in four copies in the genome (Delomas et al. submitted). Thus, rather than evaluate the raw LLR results, we ranked each locus based on fit to 4N or 8N models.
We compared this to two measures of congruence with tetrasomic inheritance. First, we evaluated the percent of comparisons for each locus reflecting Mendelian incompatibilities in several parent-offspring genotypes of known crosses (4 dams, 2 sires, and 128 offspring genotyped at 90% completeness) of white sturgeon spawned by the Yakama Nation using a custom R script (Supplemental File 2). Mendelian incompatibilities were identified as any offspring genotype that was absent from the set of possible offspring genotypes from a pair of adults formed from all possible combinations of all potential diploid gametes assuming the absence of double reduction, which has not been identified in white sturgeon (Drauch Schreier et al., 2011; A. L. Van Eenennaam, Murray, & Medrano, 1998). Genotypes of each individual were obtained by modification of the GT-seq pipeline that accommodates different ploidies by integrating funkyPloid and assuming a normal distribution of read ratios for each allele ratio-genotype category with standard deviation starting at 0.05 for 4N and progressively reduced for each higher ploidy following s.d.=0.05*(4/ploidy) to avoid overlap of allele ratio categories (https://github.com/stuartwillis/gt-seq-ploidy). Allele ratios that fall outside the 95% confidence bounds for each genotype category are scored as missing. As such, the genotype thresholds are more stringent for increasing ploidies, and higher sequencing effort may be required to precisely estimate read ratios and genotype higher ploidy samples at all loci. Second, we visually inspected the allele ratio plots of the 2,378 4N individuals and rated each locus on a scale of 1-4 for conformance to the expected allele ratios for tetrasomy (95% confidence interval of normal distribution). The ratings were 1: almost all (~<5%) ratios fall within expected 95% bounds of 5 genotype classes; 2: ratio medians fall within expected 95% bounds of 5 genotype classes but with minor shifts towards the reference or alternate allele (allele bias), and ~<25% of ratios out of bounds; 3: ~5 genotype classes, but ratio medians often fall outside bounds (medians strongly skewed), and/or many (~>25%) allele ratios fall outside confidence intervals; 4: 6+ genotype classes, homozygote allele ratio medians fall off x,y axes, and/or no distinct genotype classes. Example plots are provided with bounds reflecting 8N genotype categories (Supplemental Figure 2).