Repeat analysis
There are different repeats types in genome sequences. So repeat
sequences analysis was performed with different methods to find
different repeat types. Firstly, simple sequence repeats (SSRs) were
identified using the MIcroSAtellite Identification Tool (MISA)
(Beier, et al. 2017). MISA can distinguish
and locate both simple and compound microsatellites. Next, a combination
of de novo -based and homology-based strategy was utilized to
search other repeat sequences. RepeatModeler (v1.0.8) was applied in
detecting repeat sequences as the de novo -based method and then,
repeat sequences, which were found by RepeatModeler, were classified by
TEclass (Abrusan, et al. 2009). These
classified sequences were merged with Repbase sequences to construct a
custom TE library (Jurka, et al. 2005).
Finally, G. przewalskii genome took advantage of the custom TE
library to annotate repeat sequences with RepeatMasker
(http://repeatmasker.org).