3.2 Genome assembly and completeness of the assembled genome
The PacBio Sequel II platform generated 95.24 Gb high-quality data from
the long-read library, covering 173.16-fold of the genome assembly
(Table 1, Supplementary Table S4). These data were assembled using
NextDenovo followed by racon and pilon polishing, which produced a
575.83 Mb genome assembly with a contig N50 of 2.15 Mb (Table 2). The
length of this assembly was consistent with the genome size estimated by
k-mer analysis.
The Illumina reads and PacBio long reads were aligned to the burbot
assembly to evaluate the quality of the initial genome assembly. The
results showed that 99.23 % of the Illumina reads and 97.55% of the
PacBio long reads were successfully mapped to the assembled genome
(Supplementary Table S5 and S6). The BUSCO analysis showed that 94.67%
(4344/4584) of the complete BUSCO were found in the genome assembly
(Table 3), including 91.93% of the complete and single copy and 2.84%
duplicated genes.
The contigs in the draft assembly were then anchored and oriented into a
chromosomal-scale assembly by using the Hi-C scaffolding approach. The
Hi-C library generated 69.51 Gb (126.38×) clean data (Table 1,
Supplementary Table S7). With the use of LACHESIS, 88.66% of the
assembled sequences were anchored to 22 pseudo-chromosomes, with
chromosome lengths ranging from 15.18 Mb to 51.8 Mb (Table 4). Based on
the heatmap, the 22 pseudochromosomes could be distinguished easily and
the interaction signal strength around the diagonal was considerably
strong, which indicated a high quality of this genome assembly (Figure
2). The final assembled genome after Hi-C correction was 575.92 Mb, with
a contig N50 of 2.01 Mb and a scaffold N50 of 22.10 Mb (Table 5).