Genetic diversity and population structure
We next asked if the population genetics data supported the three independent origins identified by genomic-based epidemiology. We developed 11 SSR markers (Table 2) and genotyped kochia populations collected from the three geographic regions to measure population-level genetic diversity and genetic similarity of populations between localities. Across all populations using Fisher’s combined probability test, all SSR loci were in linkage equilibrium (P > 0.05), but not in Hardy-Weinberg equilibrium (HWE; P < 0.05).
Across all loci and populations, 3.98% of the data were missing (Table S2). Of the loci, marker “SSR162” had the most missing data across populations at 13.8%, whereas across loci, populations KS2S and KS8S had the most missing data at 26.3% and 17.2% missing, respectively (Table S3). For descriptive summaries and the neighbor-joining tree, marker “SSR162” was removed as well as five individuals: KS2S_4, KS2S_5, KS2S_7, KS2S_8, KS8S_2, as this locus had more than 10% missing data and the individuals had more than 20% missing data after “SSR162” was removed.
Allele counts, expected heterozygosity, and evenness for all loci across all populations and then after the removal of individuals with missing data are reported in Table S2. Descriptive summaries of 44 populations at ten SSR loci are presented in Table 3 (data for all 11 SSR loci are presented in Table S3). Populations ranged in their percentage of total alleles observed and allelic richness from 57.7% and 1.42 (KS13R) to 24.7% and 2.61 (CO7R). F IS ranged from -0.04 (95% CI = -0.28 – 0.13; KS13R) to 0.58 (95% CI = 0.33 – 0.79; MT3R), while most are in the range of 0.2 to 0.4. A positiveF IS indicates a deficiency of heterozygotes in the population compared to the proportion expected in HWE and a negativeF IS indicates an excess of heterozygotes. TheF IS results should be interpreted with caution noting that loci did not meet the assumptions of Hardy-Weinberg (Waples, 2015) and many confidence intervals spanned from negative to positive and over very large ranges.
As loci and populations did not meet the assumptions of HWE, a neighbor-joining tree was used to assess genetic similarity between populations. This tree showed some expected groups by region, with 12 Central Great Plains populations grouped in a large clade supported 100% by bootstrap values (Figure 2). This clade also contained OR4R (Pacific Northwest) and MT2R (Northern Plains). The STRUCTURE analysis showed that K=3 was the number of clusters or gene pools best supported (Figure S1) and also supported the grouping of OR4R and MT2R with the Central Great Plains populations including CO1R, KS10R, and KS11R (Figure 3). The populations from the Pacific Northwest largely clustered together (OR2R, OR3R, OR6R, OR7R, OR9S, ID1R, and ID2R) with the clade of populations OR9S and ID1R and OR7R and ID2R supported at 61.5% (Figure 2). Populations KS13R, MT3R, and CO6R clustered with this Pacific Northwest group (Figure 2) and had similarity in the STRUCTURE analysis (Figure 3). Some groupings were unexpected, such as a grouping of TX2R, TX3R, TX4R, and TX5R (Central Great Plains) populations with Alberta, Canada (Northern Plains), as well as OR1R (Figure 2).