2.1 Transcriptomics
A common tool used for most systems biology analysis is RNA sequencing. RNA sequencing is used to study the transcriptome of a particular cell or organism. The core workflow of any RNA sequencing experiment is the extraction of the RNA, followed by enrichment of the subtype of RNA to be analyzed and depletion of the RNA subtypes that are not of experimental interest. This is followed by the preparation of an adapter ligated cDNA library, amplification of the constructed library, and high throughput sequencing of the library between 10 and 30 million reads per sample. Next step is the computational analysis of the sequenced library. This involves alignment to a reference transcriptome, quantification of overlapping sequences, data normalization between samples and pre-processing, and statistical modeling which is typically done through a surplus of different coding languages and software packages. Majority of RNA sequencing experiments are done with short read sequencing instruments, but recent advances in long read sequencing and direct read sequencing offer new approaches and methodologies to tackle questions not answerable within short read sequencing alone. Each of these approaches come with their own limitations. For short read sequencing a major limitation involves biases that are introduced during sample preparation and downstream computational analysis. These biases can affect the quantification of gene isoforms (especially regarding longer transcripts), and how the multi-mapped reads are processed. Long read sequencing and direct read sequencing aim to overcome these limitations, yet they also have limitations of their own. However, regardless of the type of approach used, the primary application of RNA sequencing remains to assess differential gene expression (DGE).7 DGE is a an effective tool for exploring the genomic and thus transcriptomic changes in response to a stimulus and can further serve as a baseline for comparing the expression changes in various treatment conditions. A well-known example of how DGE can be used to analyze genetic reprogramming is the stimulation of cells with LPS, which activates both MyD88 and TRIF dependent TLR-4 to initiate the inflammatory cascade. This involves differential expression of key inflammatory genes such as the MAPKs, IRFs, and the master regulator nuclear transcription factor κB (NF-κB). These proteins then induce the production of both pro-inflammatory cytokines and mediators of inflammation leading to a tightly controlled and targeted inflammatory response.8,9 These changes are evident in a RNA seq dataset, after analysis with bioinformatics tools and measuring differential gene expression. Conceptually, differential expression in this case refers to the changes in the expression of a gene from the reference group (unstimulated) to the treated group (LPS stimulated). Various software packages exist in both R and Python, to examine the differential expression. Some notable examples are Limma and DESeq2.10,11 These packages perform statistical tests on the data and output log fold change and p values which define how differentially expressed a particular gene might be. Using transcriptomics data analysis methodologies to detect differential expression is a key first step for any systems biology experiment, as it defines the baseline of comparison for further analysis with other omics methods and provides a comprehensive output of gene expression. DGE is an indication of proteins that are translated and involved in cellular signaling dynamics for the pathway of interest.