FIGURE 3 Analysis of amino acids and codon bias among six medicinal plants of Polygonatum . (A) Frequency of amino acids in the chloroplast genomes of six Polygonatum . (B) RSCU percentage analysis of codons in chloroplast genomes. (C) Heat-map of the RSCU values among six Polygonatum .

3.3 | Statistics of codon usage

The total number of codons in the chloroplast genome of the six medicinal plants of Polygonatum in protein-coding sequences was 23, 381 (P. kingianum ) to 26, 036 (P. odoratum ), containing 61 codons encoding 20 amino acids (termination codons were not incorporated in the statistics). Amino acids are encoded by 2–6 synonymous codons, most of which are not Met and Trp. Leu was encoded by the highest number of codons, accounting for 10.3%, whereas Cys was encoded by the lowest number of codons, accounting for 1.2% (exceptP. kingianum accounting for 1.1%). The RSCU value can be used to detect a synonymous codon usage bias. Except for Met and Trp (RSCU = 1), which do not show codon usage bias, most amino acid codons have usage bias. Thirty types of codons were found with RSCU > 1 in the six medicinal plants of Polygonatum , of which 28 were A/T-ending codons. Only the TTG codon encoding Leu and the TCC codon encoding Ser ended with G/C, indicating that A/T bases were preferred and G/C bases were not preferred. A comprehensive analysis of the histogram and heat map of codon usage showed that codon usage of the six species was consistent. The analysis of RSCU values provided data for studying the evolution and gene expression of Polygonatum (Figure 3, Table S14).

3.4 | IR borders comparison

During plant chloroplast genome evolution, the IR regions are accompanied by contraction and expansion and some genes enter the IR or SC regions. The IR/SC boundaries and their adjacent genes in the six medicinal plants of Polygonatum were compared using CPJSdraw. As shown (Figure 4), the total sequence length and IR region length of the chloroplast genome between species were relatively conserved and the genotypes of the IR/SC borders were essentially the same. Genesrpl22 , rps19 , trnN , ndhF , ycf1 andpsbA were present at the IR boundaries. The front ends ofrps19 genes of P. zanlanscianense , P. kingianum ,P. cyrtonema , P. filipes and P. odoratum were 13 or 17 bp away from the IRb boundary, whereas in P. sibiricum , therps19 gene front ends coincided with the LSC/IRb boundary. In the LSC/IRa boundary, the end of the rps19 gene of P. sibiricum coincides with the LSC/IRa boundary and is also different from other species. rpl22 was completely situated in the LSC region and was 27–34 bp away from the LSC/IRb boundary, P. sibiricum was 47 bp. In six medicinal plants of Polygonatum, ndhF gene was prolonged to the IR by 22–34 bp. The ycf1 gene spans the junction between the SSC and IRa. The pbsA gene is located downstream of the junction of LSC and IRa, 87–91 bp from the boundary.

3.5 | Sequence divergence and high variation regions analyses

The mVISTA online tool was used to globally align the chloroplast genomes of these Polygonatum species, with P. zanlanscianense as the reference and the sequence differences between their genomes were compared (Figure 5). In comparison, the chloroplast genome sequences of six Polygonatum species were generally conserved. From the position of sequence differences, the rRNA gene region (blue part) was highly conserved, the non-coding region (red part) was more variable than the conserved protein coding region (purple part). The variation of LSC region and SSR region was greater than that of IR region and the difference was greater in LSC region, followed by SSR region. In addition, DnaSP software was used to determine the nucleotide diversity of the chloroplast genome of six medicinalPolygonatum plants and to identify mutation hotspots (Figure 6). The results showed that the Pi values of Polygonatum was 0–0.02633 and the high-variation regions were mainly concentrated in the LSC and SSC regions. Twenty-one genic regions with high Pi values (Pi ≥ 0.01) were considered hotspots. Among them, 11 genic regions were located in the LSC region, namely psbA , trnK-UUU ,psbI-trnS-GCU , trnS-GCU , rpoB , trnL-UAA ,trnF-GAA , and psbJ ; among them, 10 genic regions were located in the SSC region, namely rpl32 , trnL-UAG ,ccsA , ccsA-ndhD and ycf1 . These hotspots provide a reference for the subsequent molecular identification ofPolygonatum medicinal plants to identify potential chloroplast DNA barcodes.