A pipeline for analysis of allele specific expression from RNA-seq data
reveals salinity-dependent response in Nile tilapia
Abstract
Species living in a changing environment are capable of adapting to
alterations of various factors. Physiological acclimatization may be
significantly influenced by the heterozygosity, especially with regards
to allele variance and its specific expression (ASE) under different
conditions. Data from RNA-seq experiments can be used to identify and
quantify the alleles expressed, in order to detect and characterize ASE
and regulation of gene expression. However, the allele matching the
reference genome creates a mapping bias that prevents a reliable
estimation of the allele depth unless the haplotype of the experimental
individuals is provided. We developed a pipeline that allows the
identification of the alleles corresponding to an RNA-seq dataset and
their unbiased quantification. This pipeline does not require the
sequencing of the DNA nor the previous knowledge of the haplotype. The
identified SNPs are further substituted in the reference genome, thus
creating two pseudogenomes with the alternative alleles on two
independent samples of the experiment. The SNPs are further called
against each pseudogenome thus providing with two SNP datasets that are
averaged for calculation of the allele depth. The final SNP calling file
contains the coordinates of the SNPs and also the ID of genes containing
the SNPs, the expressed genotypes, the unbiased allele depth and the
statistical tests for identifying ASE according to the experimental
design and correlated with differentially expressed genes. Therefore,
the pipeline presented here can calculate ASE in non-model organisms and
can be applied to previous RNA-seq datasets for expanding studies in
gene expression regulation.