Subgenome comparisons
Heterozygous genotype calls were also used to infer subgenome identity using a custom python script (https://github.com/david-goad/paspalum-hybridization/het_venn.py). For this analysis we used the dataset comprising all unique genotypes after removal of sites with <0.05 heterozygosity. We calculated the number of loci that were heterozygous in each of our three morphological groups (the two P. vaginatum ecotypes and P. distichum,excluding three putatively admixed accessions). We then compared the number of loci with heterozygous calls unique to each group to those that were shared between groups. Following the logic that ‘heterozygous’ loci reflect subgenome differences within individuals, we expected that subgenomes shared between groups would be reflected as shared ‘heterozygous’ loci. In the case of allopolyploid and diploid hybrid groups (P. distichum and coarse-textured P. vaginatum ; see Results), high numbers of shared loci would indicate sharing of a subgenome that is not present in the diploid fine-textured P. vaginatum samples.
To investigate the number of subgenome copies received from each parent in triploid coarse-textured genotypes, we used a custom python script (https://github.com/david-goad/paspalum-hybridization/triploid_comp.py) that counts reads at heterozygous loci in a SAM file (Fig. S2). Based on the distribution of the relative number of reference vs non-reference reads at each locus, the number of subgenome copies can be inferred (Delomas, 2019; Gompert & Mock, 2017). We chose the highest coverage individual from each of the three putative triploid genotypes and the diploid genotype with the greatest number of genotypically-identical accessions (to serve as a 1:1 control). In cases where a locus was called as heterozygous in all four of the genotypes, we counted the number of each read type covering the position in the same file based on their CIGAR string (the column of the SAM file which identifies the position of variants in the read). To remove the effect of low-quality reads (often manifested as singletons) we only compared the two most common read types. Additionally, sites were excluded if neither of the two most common reads was identical to the reference genome (to ensure unambiguous assignment to a subgenome) or if their combined count was less than 20 (to ensure adequate depth). The number of reference and alternate reads at each locus was then visualized as a scatter plot for the raw counts and a histogram for the percent of reads matching the reference genome. The diploid genome should return a distribution of loci centered on a 1:1 read count ratio (50% reference genome reads, 50% alternate reads). Triploids should return distributions that are centered on either 1:2 (33.3% reference genome, 66.7% alternate) or 2:1 (66.7% reference genome, 33.3% alternate) depending on the subgenome composition (e.g. ABB or AAB respectively, where the reference genome is A).