Genome assembly and assessment
As for the genome assembly, all of the subreads was corrected by Falcon
v1.8.7, (https://github.com/falconry/falcon/releases) with specific
parameters (length_cutoff = 18,000; length_cutoff_pr = 19,000) to
generate the preads. And the initial genome was assembled with
smartdenovo (wtpre -J 3000, wtzmo -k 21 -z 10 -Z 19 -U -1 -m 0.1 -A
1000, https://github.com/ruanjue/smartdenovo) by using the corrected
preads. In order to produce more precise genome sequence, initial genome
was polished by Arrow with all of
subreads based on default parameters. All high-quality NGS data was used
to polish the Arrow-correct genome by nextpolish with specific
parameters (task=12121212) to obtain the polished genome
(Walker, et al. 2014;
Hu, et al. 2020). Finally, to acquire
non-redundant haploid genome, some short and redundant sequences were
removed from the polished genome by using redundans
(Pryszcz and Gabaldon 2016) with some
parameters (identity=0.824; coverage=0.8).
To assess the precise and non-redundant of genome, we carried out four
methods as follows: (1) RNA-seq data were mapped to G. przewalskigenome by using hisat2 (Pertea, et al.
2016) with default parameters for the accuracy of gene regions (2) the
genome of subreads data and NGS data were mapped to genome with minimap2
(-x pb) and bwa based on default parameters, respectively for the
accuracy of assembly sequences (Li and
Durbin 2009; Li 2018) . (3) the NGS
mapping file was utilized to analysis the genome single-base accuracy by
calling SNPs and Indels. (4) BUSCO database (https://busco.ezlab.org/)
was
employed
to assess the completeness of genome with default parameters.