Features of the assembled genome
The genome size of PFM is estimated to be 338.52-352.59 Mb through k-mer analysis depend on the k-mers used (k = 17, 21, 25, 35). The k-mer distributions showed double peaks, indicating that this genome has a high rate of duplication and heterozygosity. The estimated heterozygosity ranges from 1.06% to 1.15% and rate of duplication ranges from 1.95% to 2.06% (Fig. 2a).
At the contig level, we assembled the PFM genome into 404.83 Mb sequences, including 275 contigs, with a contig N50 length of 2.62 Mb. Based on contig interaction frequency calculated from the pairs aligned to the contigs, the 275 contigs were clustered into 32 linkage groups (Fig. 2b). The longest contig group was 19.1 Mb while the shortest one was 2.63 Mb, with an N50 of 14.39 Mb. BUSCO analysis showed that 98.2% (single-copied gene: 97.2%, duplicated gene: 1.0%) of 1,658 genes were identified as complete, 0.40% of genes were fragmented, while 1.4% of genes were missing in the assembled genome. The genome comprised 36.96% GC base pairs.
Synteny analysis showed that the PFM, S. litura and C. pomonella genome have a highly conserved gene order (Fig. 2c). PFM has similar chromosomes as S. litura , including 30 autosomes, a Z chromosome (Chr01) and a female specific W chromosome, while C. pomonella has undergone three fusion events, resulting in 27 autosomes, a W chromosome, and a neo-Z chromosome arising from a Z-autosomal fusion (Wan et al., 2019). The chromosome-level assembly of the PFM genome provides resources for understanding chromosome evolution in the Lepidoptera (Ahola et al., 2014).