Chromosome structural variation and GO analysis
To investigate the differences between subgenome A and subgenome D, we performed synteny analysis between paralogs in the P. tomentosagenome. This revealed collinear in-paralogous gene pairs, and suggested general collinearity at the sub-genome level, with dispersed collinear blocks among homologous and nonhomologous chromosomes (Fig. 5, center). We found 65,864 paralogous gene-pairs, 1,434 collinear blocks, and 65,444 collinear gene-pairs between the two subgenomes (Table S14). We infer that these may have arisen from duplication events that occurred in Populus prior to its divergence as a section ofPopulus .
To study genome-wide structural variation (SV), including copy number variation (CNV), deletions (DEL), insertions (INS), inversions (INV), and translocations (TRANS) among chromosome pairs (Fig. 5, rings 1-5 (referred to as circled numbers such as “①” hereafter), we conducted alignments using MUMmer, and subsequently called them out using SVMU (Structural Variants from MUMmer) 0.3 (https://github.com/mahulchak/svmu). The results indicated that there were abundant chromosome structural variations in the P. tomentosa genome. Across the whole genome we detected 15,480 structural variations in total, of which INS (6,654) and DEL (6,231) accounted for the majority (83%). The other variant numbers were 1,602 and 694, and 299 for INV, TRANS and CNV, respectively, which together accounted for 27% of the total number of SVs observed (Table S15). The vast majority of INS, DEL, and CNV variations occurred between homologous chromosome pairs, whereas TRANS were generally seen between non-homologous pairs (Table S15, Fig. S9).
By plotting the distribution of five SV types along 38 P. tomentosa chromosomes, we observed that a total of 299 CNVs had an irregular and sporadic distribution across the whole genome (Fig. 5). Relatively, high-density CNVs were seen on Chr17A and Chr17D (0.54/Mb), Chr09A and Chr09D (0.47/Mb), whereas comparably low-density CNVs distributed on Chr06A and Chr06D (0.13/Mb), Chr13A and Chr13D (0.15/Mb), Chr07A and Chr07D (0.18/Mb) (Fig. 5②). We also noticed that most of DELs were almost evenly distributed through the whole genome, showing a slight preference for the telomere regions of Chr12A, Chr12D, Chr17A, Chr17D, Chr18A and Chr18D (Figure 5③). Similarly, INSs were present at high-density and showed a slight preference for telomere regions of Chr07A, Chr07D, Chr15A, Chr15D, Chr18A and Chr18D (Figure 5④). In contrast, INVs had a more uneven distribution across the genome (Figure 5⑤). INVs were more abundant on Chr01A and Chr01D, whereas their distribution was limited on other chromosomes. TRANS were very sparsely distributed on chromosomes, with only a few detected on Chr02D, Chr07D, Chr08D, Chr13D and Chr14D (Figure 5⑥).
We performed GO enrichment analysis for the genes located in the total 15,480 SVs region using the Plant GoSlim database, and detected 23 GO categories significantly over-represented with respect to the whole set of genes (Fig. 6). Ten of them (“motor activity,” “transporter activity,” “DNA binding,” “transport,” “metabolic process,” “lysosome,” “nuclear envelope,” “peroxisome,” “cell wall” and “extracellular region”) were over-represented in genes affected by INS, three (“chromatin binding,” “translation” and “ribosome”) were over- represented in genes affected by CNV, three (“hydrolase activity,” “response to biotic stimulus” and “lipid metabolic process”) were over-represented also in genes affected by both INS and TRANS, two (“cell differentiation” and “growth”) were over-represented also in genes affected by INV, two (“vacuole” and “circadian rhythm”) were over-represented also in genes affected by TRANS, one (“endosome”) was over-represented also in genes affected by both DEL and CNV, one (“carbohydrate binding”) was over-represented also in genes affected by DEL, CNV and TRANS, and one (“plasma membrane”) was over-represented also in genes affected by both CNV and TRANS. Overall, functional annotation showed enrichments associated with all of the major GO categories (Fig. 6a).
To explore the biological importance of the SVs, we further annotated genes which were highly enriched in above GO categories. We found that many genes with CNV, INS and DEL regions are involved in disease-resistance and sugar metabolism pathways (Fig. 6b). For examples, Potom05G0191000 and Potom05G0207500 with CNV, Potom06G0303900 and Potom01G0355800 genes with DEL, all of which encode LRR receptor-like serine/threonine-protein kinase FLS2, which may be important for disease resistance. The disease-resistant genes in INS region are mainly annotated as nitro oxide synthase, enhanced disease susceptibility 1 protein and pathogenesis related protein 1, which are involved in plant hormone signal transduction and plant-pathogen interaction. More interestingly, we found 3 copies of both Potom05G0191000 and Potom05G0207500 in subgenome P. adenopoda , and 11 copies of both Potom05G0191000 and Potom05G0207500 in subgenomeP. alba var. pyramidalis . Previous studies in,Glycine max (McHale et al., 2012) also indicated that structural variations such as CNV are common in genes related to disease resistance and biological stress. More copy numbers of both Potom05G0191000 and Potom05G0207500 may help explain why the elite individual LM50 shows strong disease resistance—a trait that is known for among forest growers. Of course, this hypothesis needs functional validation.
We also found many genes involved in carbohydrate metabolism had structural variations including CNV, DEL and INS. They were , for example, as UDP-glucuronate 4-epimerase, alpha-1,4-galacturonosyltransferase, and beta-galactosidase (Fig. 6b). In addition, Potom03G0262900 and Potom01G0217800 that showed INS variation were annotated as ADP sugar diphosphatase and pectinesterase, and involved ribose phosphorylation and pentose and glucuronate interconversions, respectively; they may be important for energy and growth. Finally, it well known that the existence of centromere and telomere plays an important role in maintaining chromosome stability. Interestingly, we also found that the three genes Potom01G0282700, Potom12G0168500 and Potom12G0040500 showed INS variation, and are involved in meiotic DNA break processing and repairing, chromatin silencing at rDNA, and histone methylation. These genes may play a role in maintaining chromosome structure or reducing the rate of meiotic recombination that we observed.