3.3 Gene prediction and annotation
Next we identified TEs and tandem repeats in the M. japonicusgenome assembly. Approximately 56.07% of the assembly represented repeats, among which the most abundant TEs were DNA transposons (44.76% of the genome), followed by simple repeats (16.88 %), long terminal repeats (LTR, 13.61 %), long interspersed elements (LINE, 7.07%), and short interspersed nuclear elements (SINE, 0.01%) (Table 3 and Figure S1). In the genome, 24,317 protein-coding genes were predicted, with an average of 5.5 exons and a 1,237.45 bp average CDS length (Figure 3 and Table 4). Pathway assignment was successful for 23,986 (98.6%) of the predicted protein-coding genes in almost one of six data pools( TableĀ 5) .