Genome sequencing
We extracted genomic DNA from 12 pupae using MagAttract HMW DNA kit (Qiagen, Hilden, Germany) for Illumina library and PacBio library. The paired-end Illumina library with insert sizes of about 500 bp, was constructed using VAHTSTM Universal DNA Library Prep Kit for Illumina® V2 (Vazyme, Nanning, China) and sequenced on an Illumina Novaseq platform to obtain 150-bp paired-end reads. The raw reads generated were filtered by the software Trimmomatic v0.38 (Bolger, Lohse, & Usadel, 2014). After filtering, we obtained 31.02 Gb of short clean reads (coverage: 77.24X). The sequencing data was used to survey genome feature and polish de novo assemblies.
For long-read sequencing, SMRTbell libraries were constructed with Sequel® Sequencing Kit 3.0 (Pacific Biosciences, Menlo Park, CA, USA). Long DNA fragments of approximately 20 kb were sequenced on a PacBio Sequel sequencer (Pacific Biosciences, Menlo Park, CA, USA). Four SMRT cells were processed and 55.52 Gb subreads (mean subread length: 18.13 kb, subread N50 length: 32.84 kb, coverage: 138.2X) were obtained for contig-level genome assembly.
To assist the chromosome-level assembly, we used the Hi-C (High-throughput chromosome conformation capture) technique to capture genome-wide chromatin interactions (Belaghzal, Dekker, & Gibcus, 2017). Twenty 5th instar larvae were ground in 2% formaldehyde for cross-linking of cellular protein. Chromatin was digested with restriction enzyme MboI overnight. Then, the DNA ends were flatted, marked with biotin-14-dCTP and ligated with bridge linker. The samples were digested with proteinase K and purified by phenol-chloroform extraction. Biotins on unligated DNA fragments ends were removed with T4 DNA polymerase. Fragments were sheared into 200-600 base pairs using an S220 Focused-ultrasonicator (Covaris, U.S.). Biotin marked DNA fragments were enriched using streptavidin C1 magnetic beads. Illumina library was constructed from the enriched fragments using VAHTSTM Universal DNA Library Prep Kit for Illumina® V2 (Vazyme, Nanning, China) and sequenced on an Illumina Novaseq platform to obtain 150-bp paired-end reads. After removing the low-quality reads, 1,509 million clean reads were retained (coverage: 559.3X).