Features of the assembled genome
The genome size of PFM is
estimated to be 338.52-352.59 Mb through k-mer analysis depend on the
k-mers used (k = 17, 21, 25, 35). The k-mer distributions showed double
peaks, indicating that this genome has a high rate of duplication and
heterozygosity. The estimated heterozygosity ranges from 1.06% to
1.15% and rate of duplication ranges from 1.95% to 2.06% (Fig. 2a).
At the contig level, we assembled the PFM genome into 404.83 Mb
sequences, including 275 contigs, with a contig N50 length of 2.62 Mb.
Based on contig interaction frequency calculated from the pairs aligned
to the contigs, the 275 contigs were clustered into 32 linkage groups
(Fig. 2b). The longest contig group was 19.1 Mb while the shortest one
was 2.63 Mb, with an N50 of 14.39 Mb. BUSCO analysis showed that 98.2%
(single-copied gene: 97.2%, duplicated gene: 1.0%) of
1,658 genes were identified as
complete, 0.40% of genes were fragmented, while 1.4% of genes were
missing in the assembled genome. The genome comprised 36.96% GC base
pairs.
Synteny analysis showed that the PFM, S. litura and C.
pomonella genome have a highly conserved gene order (Fig. 2c). PFM has
similar chromosomes as S. litura , including 30 autosomes, a Z
chromosome (Chr01) and a female specific W chromosome, while C.
pomonella has undergone three fusion events, resulting in 27 autosomes,
a W chromosome, and a neo-Z chromosome arising from a Z-autosomal fusion
(Wan et al., 2019). The chromosome-level assembly of the PFM genome
provides resources for understanding chromosome evolution in the
Lepidoptera (Ahola et al., 2014).