2 | MATERIALS AND METHODS

2.1 | Plant materials collection and DNA extraction

This study collected six medicinal plants of Polygonatum from different regions (Figure S1, Table S1). The species was confirmed and identified and all voucher specimens were stored at the Chinese Materia Medica Resource Center, Anhui University of Chinese Medicine (Hefei, China). Healthy and fresh leaves were chosen to extract the complete genomic DNA using a plant DNA mini kit (Plant DNA Kit D3485, Omega Bio-Tek, Guangzhou, China). The purity, integrity, and concentration of the DNA were checked using a NanoDrop 2000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA) and 1.0%(w/v ) agarose gel electrophoresis (Wu et al., 2021). The concentration of DNA samples that meet the requirements of chloroplast genome sequencing ≥ 20 ng/μL; the total amount of samples ≥ 100 ng; OD260/280 = 1.8–2.2, and high-quality DNA was used to construct gene libraries (Zhu et al., 2018).

2.2 | Chloroplast DNA sequencing, assembly and annotation

Genesky Biotechnologies Inc. (Shanghai, China) was commissioned to use Illumina HiSeq 4000 to randomly sequence the chloroplast genomes of each DNA sample from Polygonatum plants. Genomic DNA was fragmented after quality control and the adaptor was ligated to construct the library. To obtain high-quality sequencing data and improve the accuracy of subsequent bioinformatic analyses, quality control and filtering of the original offline data must be performed. For example, excluding sequences containing more than 3 N bases, eliminating sequences with less than 60% of high-quality bases (Phred score ≥ 20), eliminating low-quality bases at the 3’ end, and removing the sequences with lengths less than 60 bp. Assembling clean reads at the contig level. According to the reference near-source species, metaSPAdes software (Nurk et al., 2017) was used for genome assembly, and the assembly results were analyzed and corrected to determine whether the ring was formed, correct the contig direction, and determine the initial base position. The chloroplast genomes were annotated using CPGAVAS2 software (Shi et al., 2019). GenBank files were drawn into a gene circle map using GeSeq (https://chlorobox.mpimp-golm.mpg.de/geseq.html) (Tillich et al., 2017). The sequence data and gene annotation information were uploaded to the National Center for Biotechnology Information (NCBI) database.

2.3 | Structure analysis of the chloroplast genomes

SSRs, also called microsatellites or short tandem repeats (STRs), are tandem repeats of DNA segments composed of 1–6 base pairs widely used in genetic analysis as molecular markers. The SSR sites of each sample genome were detected using the online software MISA (Beier et al., 2017) (https://webblast.ipk-gatersleben.de/misa/), with the minimum repeat parameters set at ten repeat units for mononucleotide, five repeat units for dinucleotide, four repeat units for trinucleotide, three repeat units for tetranucleotides, pentanucleotides, and hexanucleotides (Wang et al., 2022). Forward, palindromic, reverse, and complementary repeats, were predicted using the REPuter (https://bibiserv.cebitec.uni-bielefeld.de/reputer), the parameters are set to hamming distance = 3, maximum computed repeats = 5, 000 bp, minimal repeat size = 30 bp (Kurtz et al,2001). Codon usage of the chloroplast genomes of six medicinal plants ofPolygonatum was investigated using the relative synonymous codon usage (RSCU) module in the Python CAI package. In gene translation, the frequency of synonymous codons corresponding to each amino acid is discrepant; that is, some synonymous codons are applied more frequently than others (Parvathy et al., 2022). The RSCU value represents relative synonymous codon usage. For RSCU=1, codon usage without preference; RSCU > 1, codon usage frequency is higher than expected; and RSCU < 1, codon usage frequency is lower than expected (Sharp et al., 1986). Microsoft Office Excel and TBtools (Chen et al., 2020) were used to convert statistical data into visual graphs.

2.4 | Comparison of the chloroplast genomes

The expansion and contraction of inverted repeat regions in the chloroplast genome may lead to changes in genome length. Using CPJSdraw software (Xu et al., 2024), we detected the inverted repeat (IR) boundary regions by comparing the locations of the coding genes. Sequence alignment of the whole chloroplast genome was performed using the online tool mVISTA (Fernández-Jiménez et al., 2021) (http://genome.lbl.gov/vista/index.shtml) in shuffle-LAGAN mode. DnaSP software (Rozas et al., 2017) was used to calculate the nucleotide diversity based on sliding window analysis, setting the window length to 600 bp and the step size to 200 bp. To investigate the presence of selective pressure on the chloroplast protein-coding genes amongPolygonatum , we used P. zanlanscianense as a reference, and the coding sequences were used to calculate ka, ks values using KaKs_Calculator2 (Wang et al., 2010).

2.5 | Phylogenetic tree construction of Polygonatum medicinal plants

Six medicinal plants of Polygonatum and other medicinal plants ofPolygonatum downloaded from the NCBI were used for phylogenetic analysis, whereas Dioscorea aspersa and Dioscorea alata were set as outgroups. A total of 59 chloroplast complete sequences were aligned using MAFFT (Katoh et al., 2013) and trimmed using TrimAL (Capella-Gutiérrez et al., 2009). The best-fit model according to Bayesian information criterion was K3Pu+F+I+I+R4, which was calculated using ModelFinder (Kalyaanamoorthy et al., 2017). An IQ-TREE (Nguyen et al., 2015) phylogenetic tree was constructed based on the whole chloroplast sequences using the PhyloSuite platform (Zhang et al., 2020). The tree is displayed on the iTOL (Letunic et al., 2021) website (https://itol.embl.de/).