3.3 Repeat annotation, gene prediction and gene annotation
A total of 384.29 Mb of repeat sequences were detected, accounting for
66.74% of the assembly genome (Table 6). This repeat content was
obviously larger than the value (36.60%) obtained from the k-mer
analysis. The repetitive sequences mainly consisted of the DNA
transposable element (289.32 Mb; 50.24% of the assembly), long terminal
repeats (66.95 Mb; 11.63%), and long interspersed elements in 30.96Mb
(5.38%) (Table 7).
A total of 21,664 protein-coding genes were predicted by the combination
of strategies based on ab initio , homologs, and RNAseq.
The average values of the gene length, exon length, and average intron
length were 14,606, 292.38, and 1,223 bp, respectively (Table 8). The
statistics of the predicted gene models were compared to other ten
teleost species,including :Acanthochromis polyacanthus,Oryzias
latipes,Amphiprion ocellaris,Anabas testudineus,Astatotilapia
calliptera,Astyanax mexicanus,Austrofundulus limnaeus,Gadus
morhua,Lepisosteus oculatus,Notothenia coriiceps , showing similar
distribution patterns in mRNA length, CDS length, exon length, intron
length and exon number (Supplementary Figure S2). The summary of genome
characteristics of burbot was shown in Figure 3. A total of 20658
predicted genes (95.36%) were successfully annotated by alignment to
the nucleotide, protein, and annotation databases InterPro, NR,
Swissprot, TrEMBL, KOG, GO, and KEGG (Table 9). A total of 6390 tRNAs,
300 rRNAs, and 519 microRNAs were identified by noncoding RNA prediction
(Supplementary Table S8).
3.4 Comparative genomics and the mechanism of adaption to
freshwater
A total of 19,998 gene families and 2,650 single-copy orthologous genes
were identified using the genomes and genes of 13 selected teleosts. In
addition, 21,664 genes of burbot could be clustered into 14,504 gene
families, including 132 unique gene families (Supplementary Table S9).
Based on the single-copy orthologous genes, the ML phylogenetic tree was
constructed and showed that burbot and Atlantic cod were clustered
together, and the divergence time between two cod species was
~44.4 Mya (Figure 4). The divergence time was consistent
with the estimated time by Hughes et al. (2018). The burbot genome
displayed 639 expanded and 1564 contracted gene families compared with
the common ancestor of burbot and Atlantic cod (Figure 4). The expanded
gene families of burbot were significantly enriched in 73 GO terms and
34 KEGG pathways, mainly including DNA integration (GO:0015074,
corrected P value =0.00E+00), DNA metabolism process (GO:0006259,
corrected P value =2.05E−06), apoptosis process (GO:0006915,
corrected P value =5.22E−05), zinc ion binding (GO:0008270,
corrected P value =2.02E−96), transition metal ion binding
(GO:0046914, corrected P value =1.19E−91), natural killer
cell-mediated cytotoxicity (ko04650, corrected P value
=4.200299E−20), and hematopoietic cell lineage (ko04640, correctedP value=3.04E−18) that were associated with cell damage repair,
ion binding, and immune system (Supplementary Tables S10 and S11).
Conversely, the burbot clearly showed contracted gene families in
homophilic cell adhesion via plasma membrane adhesion molecules
(GO:0007156, corrected P value =2.69E−29), cell-cell adhesion via
plasma-membrane adhesion molecules (GO:0098742, corrected P value
=2.69E−29), membrane (GO:0016020, corrected P value=1.18E−10) GO
terms, amino sugar and nucleotide sugar metabolism (ko00520, correctedP value=1.52E−04), and NOD-like receptor signaling (ko04621,
corrected P value=2.41E−02) pathways (Supplementary Tables S12
and S13).
Notably, three freshwater species shared no expanded gene families and
two contracted gene families associated with cell adhesion (GO:0007155:
corrected P value=0.00E+00) and membrane (GO:0016020, correctedP value =0.00E+00) (Supplementary Table S14). These functions are
critical for adjusting the ion concentrations inside and outside the
cell. However, no enriched KEGG pathway was found for the contracted
gene families. Such gene families may reflect the reduced functional
requirements of a stable ionic environment in freshwater for cell
membrane permeability. These findings are consistent with the different
components of omega-3 fatty acids between marine and freshwater fish
(Taşbozan & Gökçe, 2017). Marine fish have higher levels of omega-3
fatty acids than freshwater species. Compared with the omega-6 fatty
acids, omega-3 fatty acids help improve cell membrane fluidity and
provide osmoregulatory capabilities.
To identify the genes evolving under positive selection for freshwater
adaptation, two different likelihood ratio tests (branch-site model)
were performed. A total of 377 genes were identified as PSGs in the
burbot genome (Supplementary Table S15). The burbot PSGs were
functionally enriched in the organic cyclic compound metabolic process
(GO:1901360, corrected P value =1.83E−02), cellular nitrogen
compound metabolic process (GO:0034641, corrected P value
=4.11E−03), RNA metabolic process (GO:0016070, corrected P value
=4.13E−03), and nucleic acid metabolic process (GO:0090304, correctedP value =6.16E−03 ) (Supplementary Table S16). Additionally, 38
PSGs were detected with three freshwater lineages (burbot, M.
albus and G. affinis ) as foreground branch (Supplementary Table
S17). Four PSGs (stk33 , ino80e , nabp1a andznf385a ) were related to DNA damage repair. Genes stk33 and
nabp1participate in the mitotic DNA
damage checkpoint. znf385a is located upstream in the p53
activating pathway. znf385a interacts with p53/TP53 and promotes
DNA damage-induced cell cycle arrest (Das et al., 2007). Protein ino80e
is a component of the chromatin remodeling INO80 complex and contributes
to the DNA double-strand break repair (Yao et al., 2008).
The exposure of freshwater fish to UV radiation may cause DNA damage.
The presence of a group of genes involved in DNA repair under positive
selection was consistent with the high levels of exposure to UV
radiation in freshwater environment compared with that in the ocean
environment. This finding suggests that these genes had functionally
convergent in three freshwater lineages.
The PSGs of freshwater lineages were enriched in folic acid transport
(GO: 0015884, slc19a1 , corrected P value =8.10E−05) GO
terms, amino acid metabolism, replication, and repair pathways
(Supplementary Tables S18 and S19). slc19a1 has an important role
in folate
transmembrane transport. Low
osmotic pressure has been
previously shown to affect the efficiency of folic acid absorption in
the intestine (Zhao et al.,2011). The positive selection onslc19a1 may improve folic acid absorption for freshwater species.
These data will serve as valuable resources for future evolution studies
of burbot.
4. Conclusion
A chromosomal-scale genome assembly of the burbot was provided by
integrating the Hi-C and PacBio long read sequencing data. The burbot is
the only freshwater member of the cod family and represents the widest
longitudinal range of freshwater fish in the world. The genome assembly
and annotation supplied the second high-quality genome of the order
Gadiformes and important genomic data for whole genome analysis to
further investigate the evolution of burbot with other cod species. A
series of candidate genes involved in freshwater adaptation were
identified in these comparative genomics analyses. The results were
beneficial in elucidating the evolution process in order Gadiformes
under environment change. These data are also useful for diverse
conservation applications, including identifying conservation units,
assessing gene flow, detecting local adaptation of the populations and
elucidating the evolutionary history of burbot.