Introduction
Whole genome duplication is hypothesized to have played a fundamental role in evolution (Dufresne, Stift, Vergilino, & Mable, 2014; Soltis, Visger, Blaine Marchant, & Soltis, 2016), including of vertebrates (Dehal & Boore, 2005; Holland, Garcia-Fernandez, Williams, & Sidow, 1994), and in particular in fishes (Crow, Stadler, Lynch, Amemiya, & Wagner, 2006; Meyer & Van De Peer, 2005). Despite this, there are relatively few extant vertebrate species that are known to be polysomic (exhibiting multivalent chromatids) (Comai, 2005), which stems in part from the processes of diploidization that occur following most polyploidization events (Lynch & Conery, 2000; Ohno, 1971; Wendel, 2000; Wolfe, 2001). Select lineages however, including some vertebrates, appear to be prone to episodic polyploidization and prolonged polysomism (Dufresne et al., 2014).
Despite the obvious differences, our understanding of polyploid evolution has largely come via study of allopolyploids, those that arise from combination of two ancestral genomes, usually through hybridization, rather than their autopolyploid counterparts, which arise from the doubling of a single ancestral genome, usually through fertilization of unreduced gametes (Dufresne et al., 2014; Soltis et al., 2016). In part this stems from several methodological challenges to developing genetic insights from polyploids, which are often more significant in auto- than allopolyploids. Developing reliable genetic markers for polyploids has been impeded by both the presence of co-amplifying homeologs whose signals cannot be discriminated, as well as true polysomic segregation of those homeologs, with the true somy obscured by homeolog co-amplification. For example, while microsatellites have often been the standard marker for population genetics because of their ease of discovery and high allelic diversity, many studies of polyploids have found mixed inheritance patterns that could reflect true mixed-somy segregation or variable amplification of homeologs from each ancestral genome (Dufresne et al., 2014).
While allopolyploids may often exhibit disomy of the ancestral genomes soon or immediately following polyploidization (Spoelhof, Soltis, & Soltis, 2017), in which case developing diploid markers is a matter of identifying ancestral genome-specific primers or probes (Dufresne et al., 2014), true polysomy in autopolyploids and segmental allopolyploids (those formed from merger of partially divergent ancestral genomes) presents additional challenges. In polysomes, determining the dosage (count or ratio) of microsatellite alleles in an individual’s genotype may be difficult when the genotyping technology is not quantitative, and the presence of null alleles can impede this further. Moreover, while the estimators for many population genetic parameters can be extended to include polysomic inheritance (Meirmans & Van Tienderen, 2013; Ronfort, Jenczewski, Bataillon, & Rousset, 1998), until recently there has been relatively little interest in incorporating these extensions into popular genetic software, the majority of which permit only diploid data. Several recent software packages that were updated to permit polyploid data (e.g. Genodive (Meirmans, Liu, & Van Tienderen, 2018), the R package adegenet (Jombart, 2008), and others, reviewed in (Dufresne et al., 2014)) or were designed specifically for polyploids (EBG; (Blischak, Kubatko, & Wolfe, 2018); Polygene; (Kang Huang, Dunn, Ritland, & Li, 2020)) make progress on this front, but these require that the ploidy and either the allelic phenotype (dosage blind genotype) or full ploidy-aware genotype be provided for each individual. For species that vary in ploidy, this generally requires separately assessing ploidy from genotyping/allelic phenotyping, adding time and expense, and in some cases precluding the use of commonly archived tissue types and preservation methods.
Here, we demonstrate a set of methodological and bioinformatic techniques which address many of these challenges in developing genetic resources for a ploidy-variable, polysomic species, the white sturgeon (Acipenser transmontanus ). The sturgeons (Acipenseriformes) are a classic example of polysomic polyploidy in vertebrates. All extant sturgeons, which exhibit between ~120 and ~360 chromosomes, are hypothesized to be polyploid relative to an extinct diploid ancestor which had 60 chromosomes (Rajkov, Shao, & Berrebi, 2014). The sterlet (Acipenser ruthenus ), a Eurasian sturgeon with ~120 chromosomes, should by this ratio be tetraploid (4N), but in exploring gene content and homology of a draft genome, Du et al. (2020) discovered extensive, though incomplete, diploidization resulting from a “segmental deduplication” process, while others have inferred both disomic and tetrasomic inheritance of microsatellite markers in this species (Rajkov et al., 2014). By similar logic, the white sturgeon (A. transmontanus ; ~240 chromosomes) should be ancestrally octoploid (8N), though microsatellite inheritance patterns have suggested both tetrasomic and octosomic segregation (Drauch Schreier, Gille, Mahardja, & May, 2011). Intriguingly, white sturgeon occasionally exhibit spontaneous autopolyploidy generally resulting in increases of chromatin content by ~1.5 (12N, dodecaploid) (Drauch Schreier et al., 2011; A. D. Schreier, May, & Gille, 2013). And though of unknown fertility, backcrossed offspring (10N, decaploid) are often viable (J. P. Van Eenennaam et al., 2019), creating a wide range of ploidies within a single species.
White sturgeon are the largest freshwater fish in North America, reaching lengths up to 6.1m, though lengths of 2m are more common (Scott & Crossman, 1973). As euryhaline fish, white sturgeon may be found along the Pacific coast as far north as the Aleutian Islands and as far south as northern Baja, though their current strongholds include the Sacramento-San Joaquin, Columbia, and Fraser River Basins (Hildebrand et al., 2016). Although the Columbia basin hosts the largest total aggregation of white sturgeon, their distribution in this system is broken into a number of de facto population segments by dams and other river modifications that prevent almost all demographic exchange (Hildebrand et al., 2016). Several of these river sections contain populations classified by the US or Canada as threatened or endangered and even more population segments are in decline due to recruitment limitation resulting from habitat degradation (Hildebrand et al., 2016). While conservation management plans have been developed for most white sturgeon population segments, a lack of robust information about historical and contemporary movement, population structure, and recruitment patterns have hindered effective solutions for these fish, which can take a decade or more to mature (Hildebrand et al., 2016). Obtaining and utilizing genetic data, in particular, has seen challenge not unlike many polyploid species (Anders et al., 2011). While microsatellite markers for white sturgeon have been available for some time (Rodzen, Famula, & May, 2004), the unclear or mixed segregation patterns of these markers has made inferring robust genetic data difficult (Clark & Schreier, 2017; Drauch Schreier et al., 2011). In addition, although some researchers have achieved moderate success by coding the polysomic data as pseudo-dominant di-somic markers or by using ploidy-agnostic analysis methods (Rodzen et al., 2004; A. Drauch Schreier, Rodzen, Ireland, & May, 2012), this has nonetheless limited the types of analyses available.
To remedy these limitations, we developed a set of single-nucleotide polymorphism (SNP) markers using reduced representation genomic libraries and tested the reliability of polysomic segregation patterns by examining inheritance in known cross families and allele ratios in a large sample of individuals. The SNP markers were developed for survey with the ‘genotyping-by-thousands’ or ‘GT-seq’ method (Campbell, Harmon, & Narum, 2015), a multiplex amplicon-based method utilizing massively-parallel sequencing to cost-effectively survey hundreds of individuals simultaneously and providing read data approximately proportional to allelic dosage. We provide updated scripts to efficiently genotype polysomic individuals in a ploidy-aware manner by incorporating the funkyPloid function from the R package tripsAndDipR v0.2.0 (Delomas et al. submitted), which fits beta-binomial mixture models to the sets of allele read counts and compares the likelihoods of candidate ploidies. This permits each individual to be genotyped in accordance with its inferred ploidy. We demonstrate the utility of these SNPs to infer parentage/relatedness and estimate population and individual-level genetic parameters using a computer package specifically designed for polyploids, Polygene (Kang Huang et al., 2020).