FIGURE 3 Analysis of amino acids and codon bias among six medicinal
plants of Polygonatum . (A) Frequency of amino acids in the
chloroplast genomes of six Polygonatum . (B) RSCU percentage
analysis of codons in chloroplast genomes. (C) Heat-map of the RSCU
values among six Polygonatum .
3.3 | Statistics of codon
usage
The total number of codons in the chloroplast genome of the six
medicinal plants of Polygonatum in protein-coding sequences was
23, 381 (P. kingianum ) to 26, 036 (P. odoratum ),
containing 61 codons encoding 20 amino acids (termination codons were
not incorporated in the statistics). Amino acids are encoded by 2–6
synonymous codons, most of which are not Met and Trp. Leu was encoded by
the highest number of codons, accounting for 10.3%, whereas Cys was
encoded by the lowest number of codons, accounting for 1.2% (exceptP. kingianum accounting for 1.1%). The RSCU value can be used to
detect a synonymous codon usage bias. Except for Met and Trp (RSCU = 1),
which do not show codon usage bias, most amino acid codons have usage
bias. Thirty types of codons were found with RSCU > 1 in
the six medicinal plants of Polygonatum , of which 28 were
A/T-ending codons. Only the TTG codon encoding Leu and the TCC codon
encoding Ser ended with G/C, indicating that A/T bases were preferred
and G/C bases were not preferred. A comprehensive analysis of the
histogram and heat map of codon usage showed that codon usage of the six
species was consistent. The analysis of RSCU values provided data for
studying the evolution and gene expression of Polygonatum (Figure
3, Table S14).
3.4 | IR borders
comparison
During plant chloroplast genome evolution, the IR regions are
accompanied by contraction and expansion and some genes enter the IR or
SC regions. The IR/SC boundaries and their adjacent genes in the six
medicinal plants of Polygonatum were compared using CPJSdraw. As
shown (Figure 4), the total sequence length and IR region length of the
chloroplast genome between species were relatively conserved and the
genotypes of the IR/SC borders were essentially the same. Genesrpl22 , rps19 , trnN , ndhF , ycf1 andpsbA were present at the IR boundaries. The front ends ofrps19 genes of P. zanlanscianense , P. kingianum ,P. cyrtonema , P. filipes and P. odoratum were 13 or
17 bp away from the IRb boundary, whereas in P. sibiricum , therps19 gene front ends coincided with the LSC/IRb boundary. In the
LSC/IRa boundary, the end of the rps19 gene of P.
sibiricum coincides with the LSC/IRa boundary and is also different
from other species. rpl22 was completely situated in the LSC
region and was 27–34 bp away from the LSC/IRb boundary, P.
sibiricum was 47 bp. In six medicinal plants of Polygonatum, ndhF gene was prolonged to the IR by 22–34 bp. The ycf1 gene spans the junction between the SSC and IRa. The pbsA gene is
located downstream of the junction of LSC and IRa, 87–91 bp from the
boundary.
3.5 | Sequence divergence
and high variation regions
analyses
The mVISTA online tool was used to globally align the chloroplast
genomes of these Polygonatum species, with P.
zanlanscianense as the reference and the sequence differences between
their genomes were compared (Figure 5). In comparison, the chloroplast
genome sequences of six Polygonatum species were generally
conserved. From the position of sequence differences, the rRNA gene
region (blue part) was highly conserved, the non-coding region (red
part) was more variable than the conserved protein coding region (purple
part). The variation of LSC region and SSR region was greater than that
of IR region and the difference was greater in LSC region, followed by
SSR region. In addition, DnaSP software was used to determine the
nucleotide diversity of the chloroplast genome of six medicinalPolygonatum plants and to identify mutation hotspots (Figure 6).
The results showed that the Pi values of Polygonatum was
0–0.02633 and the high-variation regions were mainly concentrated in
the LSC and SSC regions. Twenty-one genic regions with high Pi values
(Pi ≥ 0.01) were considered hotspots. Among them, 11 genic regions were
located in the LSC region, namely psbA , trnK-UUU ,psbI-trnS-GCU , trnS-GCU , rpoB , trnL-UAA ,trnF-GAA , and psbJ ; among them, 10 genic regions were
located in the SSC region, namely rpl32 , trnL-UAG ,ccsA , ccsA-ndhD and ycf1 . These hotspots provide a
reference for the subsequent molecular identification ofPolygonatum medicinal plants to identify potential chloroplast
DNA barcodes.