Genome annotation
We identified 29,228 protein-coding genes in the 1st round of MAKER annotation. BUSCO analysis revealed 91.9% of the evaluated single-copy genes were identified as complete. After three rounds of MAKER annotation, the number of genes increased to 52,667, while the proportion of complete single-copy genes was up to 95.2%. After filtering based on gene expression analysis, functional domains and AED values, 23,218 genes remained. BUSCO analysis showed that 95.0% (single-copied gene: 94.1%, duplicated gene: 1.1%) of the evaluated single-copy genes were identified as complete, 1.6% of the genes were fragmented, and 3.2% of the genes were missing in the annotated gene set. In total, 19,206 genes (82.72%) were functionally annotated, of which 5,970 (25.71%) and 3,134 (13.50%) genes annotated to GO terms and KEGG KOs respectively. We predicted 53 rRNAs, 11,076 tRNAs, 20 small nuclear RNAs, and 48 micro RNAs in the PFM genome based on Rfam databases.
In total, 45.5 Mb (11.33%) of the genome was identified to be repeat DNA. Overall, 259,729 transposable elements (TEs) including 125,601 retroelements (17,962 short interspersed nuclear elements (SINEs), 95,657 long interspersed nuclear elements (LINEs) and 11982 long terminal repeats (LTR)) and 34,478 DNA transposons were identified.
Orthology and phylogenetic relationships of lepidopterans
OrthoFinder assigned 320,821 genes (93.41% of total) to 15,076 orthogroups for the 16 species compared. Fifty percent of the assigned genes were in orthogroups with 28 or more genes (G50 was 28) and were contained in the largest 3,174 orthogroups (O50 was 3,174).
There were 947 single-copy genes with 364,262 reliable sites retained for phylogenetic inference. The topology is congruent with previously inferred phylogenetic relationships of Lepidoptera, in which no representative of the Copromorphoidea was included (Wan et al., 2019). Current molecular phylogenetic studies have not resolved the phylogenetic relationship between Copromorphoidea and Papilionoidea (Mitter, Davis, & Cummings, 2017). Our result supports the notion that PFM from the Copromorphoidea forms a sister-group relationship to the butterfly D. plexippus (Papilionoidea), rather than a sister group between Copromorphoidea/Papilionoidea and Pyraloidea + (Noctuoidea + Bombycoidea) (Fig. 3a).
We investigated orthogroups shared by PFM and four species of Lepidoptera representing different clades of the phylogenetic tree of Lepidoptera (Fig 3b). There were 7,827 orthogroups (60.5% of 12,938 orthogroups) shared by all five lepidopteran species and 1,549 orthogroups shared by four species except for C. pomonella . We identified 357 orthogroups specific to PFM, fewer than that of B. mori (406), but higher than other three lepidopteran species (Fig. 3b).