Genome sequencing
We extracted genomic DNA from 12 pupae using MagAttract HMW DNA kit
(Qiagen, Hilden, Germany) for Illumina library and PacBio library. The
paired-end Illumina library with insert sizes of about 500 bp, was
constructed using VAHTSTM Universal DNA Library Prep
Kit for Illumina® V2 (Vazyme, Nanning, China) and sequenced on an
Illumina Novaseq platform to obtain 150-bp paired-end reads. The raw
reads generated were filtered by the software Trimmomatic v0.38 (Bolger,
Lohse, & Usadel, 2014). After filtering, we obtained 31.02 Gb of short
clean reads (coverage: 77.24X). The sequencing data was used to survey
genome feature and polish de novo assemblies.
For long-read sequencing, SMRTbell libraries were constructed with
Sequel® Sequencing Kit 3.0 (Pacific Biosciences, Menlo Park, CA, USA).
Long DNA fragments of approximately 20 kb were sequenced on a PacBio
Sequel sequencer (Pacific Biosciences, Menlo Park, CA, USA). Four SMRT
cells were processed and 55.52 Gb subreads (mean subread length: 18.13
kb, subread N50 length: 32.84 kb, coverage: 138.2X) were obtained for
contig-level genome assembly.
To assist the chromosome-level assembly, we used the Hi-C
(High-throughput chromosome conformation capture) technique to capture
genome-wide chromatin interactions (Belaghzal, Dekker, & Gibcus, 2017).
Twenty 5th instar larvae were ground in 2%
formaldehyde for cross-linking of cellular protein. Chromatin was
digested with restriction enzyme MboI overnight. Then, the DNA
ends were flatted, marked with biotin-14-dCTP and ligated with bridge
linker. The samples were digested with proteinase K and purified by
phenol-chloroform extraction. Biotins on unligated DNA fragments ends
were removed with T4 DNA polymerase. Fragments were sheared into 200-600
base pairs using an S220 Focused-ultrasonicator (Covaris, U.S.). Biotin
marked DNA fragments were enriched using streptavidin C1 magnetic beads.
Illumina library was constructed from the enriched fragments using
VAHTSTM Universal DNA Library Prep Kit for Illumina®
V2 (Vazyme, Nanning, China) and sequenced on an Illumina Novaseq
platform to obtain 150-bp paired-end reads. After removing the
low-quality reads, 1,509 million clean reads were retained (coverage:
559.3X).