3.3 Gene prediction and annotation
Next we identified TEs and tandem repeats in the M. japonicusgenome assembly. Approximately 56.07% of the assembly represented
repeats, among which the most abundant TEs were DNA transposons (44.76%
of the genome), followed by simple repeats (16.88 %), long terminal
repeats (LTR, 13.61 %), long interspersed elements (LINE, 7.07%), and
short interspersed nuclear elements (SINE, 0.01%) (Table 3 and Figure
S1). In the genome, 24,317 protein-coding genes were predicted, with an
average of 5.5 exons and a 1,237.45 bp average CDS length (Figure 3 and
Table 4). Pathway assignment was successful for 23,986 (98.6%) of the
predicted protein-coding genes in almost one of six data pools( TableĀ 5) .