Gene family and phylogenetic analysis
To identify gene families in the G. przewalskii genome, we selected genomes of 14 other fish data and these data was downloaded from the open-source database (Table S2). We performed the OrthoMCL (v2.0.9) pipeline to identify gene families between genomes of these species (Li, et al. 2003). All-to-all BASTP with an E-value threshold of 1e-5 was applied to determine the similarities between protein sequences of the longest transcript of each gene for these species, and genes were classified into orthologues, paralogues and single copy orthologues (only one gene in each species), respectively.
Molecular phylogenetic analysis was performed using single copy orthologous genes, and each gene family for multiple sequence alignment used Mafft and curated the alignments with Gblocks v0.91b (Castresana 2000; Katoh and Standley 2013). We constructed the phylogenetic tree based on the GTRGAMMA model and a bootstrap of 100 by RAxML (v 8.2.11) (Stamatakis 2006). MCMCTREE in PAML v4.9e was used to estimate the divergence times (Yang 1997). Three fossil calibration times were obtained from the TimeTree database (http://www.timet ree.org/).