RAD Library sequencing and SNP Ascertainment
Single nucleotide polymorphic (SNP) loci were identified from a restriction site-associated DNA (RAD) library created using a single enzyme (sbfI) (Miller, Dunham, Amores, Cresko, & Johnson, 2007; Baird et al. 2008; Puritz et al., 2014). Twenty-eight barcoded samples of white sturgeon from a broad geographic range were prepared, pooled equimolarly, and sequenced on an Illumina HiSeq (100bp paired end, quality trimmed to 80bp). The ascertainment panel included 12 samples from the lower/middle Columbia River, 14 from the lower/middle Snake River, and 2 from the Sacramento River. Forward reads were processed using the Stacks (Catchen, Amores, Hohenlohe, Cresko, & Postlethwait, 2011) pipeline, using assembly parameters of M of 2 (maximum mismatch; ustacks), N of 4 (maximum secondary mismatch; ustacks), n of 2 (maximum sample mismatch; cstacks), and m of 4 to 16 (minimum stack depth; ustacks) depending on the depth of coverage of that individual, i.e. the nearest integer using million raw reads * 2 . The relatively low mismatch thresholds (versus a default 3) ensured that only very similar reads were assembled, and assisted in preventing homeologs (homologs derived from polyploidization) from being clustered (Dufresne et al., 2014; but see Ilut, Nydam, & Hare, 2014; O’Leary, Puritz, Willis, Hollenbeck, & Portnoy, 2018; Willis, Hollenbeck, Puritz, Gold, & Portnoy, 2017). Stacks were filtered to retain variants with minor allele frequencies above 5%, genotyped in at least 80% of individuals, and for combined depth between 1,050 and 1,600 sequence reads (first mode of a multi-modal distribution; Supplemental Figure 1). Using read ratios as proxy for genotypes of unknown ploidy, a principal components analysis was performed with the candidate SNPs. SNPs with the highest loadings on the first 10 Eigenvectors were selected for further development. Forward reads containing candidate markers were concatenated with their reverse-complemented paired reads to increase target length, and primers were developed using Primer3. Primers were tested individually using standard PCR conditions, and in combination using standard GT-seq multiplex conditions (Supplemental File 1). Pooling thresholds, the number of individuals that can be simultaneously genotyped in a single run, were tailored to produce >90% genotyping success across individuals, as discussed later.