Gene family and phylogenetic analysis
To identify gene families in the G. przewalskii genome, we
selected genomes of 14 other fish data and these data was downloaded
from the open-source database (Table S2). We performed the OrthoMCL
(v2.0.9) pipeline to identify gene families between genomes of these
species (Li, et al. 2003). All-to-all
BASTP with an E-value threshold of 1e-5 was applied to determine the
similarities between protein sequences of the longest transcript of each
gene for these species, and genes were classified into orthologues,
paralogues and single copy orthologues (only one gene in each species),
respectively.
Molecular phylogenetic analysis was performed using single copy
orthologous genes, and each gene family for multiple sequence alignment
used Mafft and curated the alignments with Gblocks v0.91b
(Castresana 2000;
Katoh and Standley 2013). We constructed
the phylogenetic tree based on the GTRGAMMA model and a bootstrap of 100
by RAxML (v 8.2.11) (Stamatakis 2006).
MCMCTREE in PAML v4.9e was used to estimate the divergence times
(Yang 1997). Three fossil calibration
times were obtained from the TimeTree database (http://www.timet
ree.org/).