Subgenome comparisons
Heterozygous genotype calls were also used to infer subgenome identity
using a custom python script
(https://github.com/david-goad/paspalum-hybridization/het_venn.py). For
this analysis we used the dataset comprising all unique genotypes after
removal of sites with <0.05 heterozygosity. We calculated the
number of loci that were heterozygous in each of our three morphological
groups (the two P. vaginatum ecotypes and P. distichum,excluding three putatively admixed accessions). We then compared the
number of loci with heterozygous calls unique to each group to those
that were shared between groups. Following the logic that ‘heterozygous’
loci reflect subgenome differences within individuals, we expected that
subgenomes shared between groups would be reflected as shared
‘heterozygous’ loci. In the case of allopolyploid and diploid hybrid
groups (P. distichum and coarse-textured P. vaginatum ; see
Results), high numbers of shared loci would indicate sharing of a
subgenome that is not present in the diploid fine-textured P.
vaginatum samples.
To investigate the number of subgenome copies received from each parent
in triploid coarse-textured genotypes, we used a custom python script
(https://github.com/david-goad/paspalum-hybridization/triploid_comp.py)
that counts reads at heterozygous loci in a SAM file (Fig. S2). Based on
the distribution of the relative number of reference vs non-reference
reads at each locus, the number of subgenome copies can be inferred
(Delomas, 2019; Gompert & Mock, 2017). We chose the highest coverage
individual from each of the three putative triploid genotypes and the
diploid genotype with the greatest number of genotypically-identical
accessions (to serve as a 1:1 control). In cases where a locus was
called as heterozygous in all four of the genotypes, we counted the
number of each read type covering the position in the same file based on
their CIGAR string (the column of the SAM file which identifies the
position of variants in the read). To remove the effect of low-quality
reads (often manifested as singletons) we only compared the two most
common read types. Additionally, sites were excluded if neither of the
two most common reads was identical to the reference genome (to ensure
unambiguous assignment to a subgenome) or if their combined count was
less than 20 (to ensure adequate depth). The number of reference and
alternate reads at each locus was then visualized as a scatter plot for
the raw counts and a histogram for the percent of reads matching the
reference genome. The diploid genome should return a distribution of
loci centered on a 1:1 read count ratio (50% reference genome reads,
50% alternate reads). Triploids should return distributions that are
centered on either 1:2 (33.3% reference genome, 66.7% alternate) or
2:1 (66.7% reference genome, 33.3% alternate) depending on the
subgenome composition (e.g. ABB or AAB respectively, where the reference
genome is A).