2.1 Transcriptomics
A common tool used for most systems biology analysis is RNA sequencing.
RNA sequencing is used to study the transcriptome of a particular cell
or organism. The core workflow of any RNA sequencing experiment is the
extraction of the RNA, followed by enrichment of the subtype of RNA to
be analyzed and depletion of the RNA subtypes that are not of
experimental interest. This is followed by the preparation of an adapter
ligated cDNA library, amplification of the constructed library, and high
throughput sequencing of the library between 10 and 30 million reads per
sample. Next step is the computational analysis of the sequenced
library. This involves alignment to a reference transcriptome,
quantification of overlapping sequences, data normalization between
samples and pre-processing, and statistical modeling which is typically
done through a surplus of different coding languages and software
packages. Majority of RNA sequencing experiments are done with short
read sequencing instruments, but recent advances in long read sequencing
and direct read sequencing offer new approaches and methodologies to
tackle questions not answerable within short read sequencing alone. Each
of these approaches come with their own limitations. For short read
sequencing a major limitation involves biases that are introduced during
sample preparation and downstream computational analysis. These biases
can affect the quantification of gene isoforms (especially regarding
longer transcripts), and how the multi-mapped reads are processed. Long
read sequencing and direct read sequencing aim to overcome these
limitations, yet they also have limitations of their own. However,
regardless of the type of approach used, the primary application of RNA
sequencing remains to assess differential gene expression
(DGE).7 DGE is a an effective tool for exploring the
genomic and thus transcriptomic changes in response to a stimulus and
can further serve as a baseline for comparing the expression changes in
various treatment conditions. A well-known example of how DGE can be
used to analyze genetic reprogramming is the stimulation of cells with
LPS, which activates both MyD88 and TRIF dependent TLR-4 to initiate the
inflammatory cascade. This involves differential expression of key
inflammatory genes such as the MAPKs, IRFs, and the master regulator
nuclear transcription factor κB (NF-κB). These proteins then induce the
production of both pro-inflammatory cytokines and mediators of
inflammation leading to a tightly controlled and targeted inflammatory
response.8,9 These changes are evident in a RNA seq
dataset, after analysis with bioinformatics tools and measuring
differential gene expression. Conceptually, differential expression in
this case refers to the changes in the expression of a gene from the
reference group (unstimulated) to the treated group (LPS stimulated).
Various software packages exist in both R and Python, to examine the
differential expression. Some notable examples are Limma and
DESeq2.10,11 These packages perform statistical tests
on the data and output log fold change and p values which define
how differentially expressed a particular gene might be. Using
transcriptomics data analysis methodologies to detect differential
expression is a key first step for any systems biology experiment, as it
defines the baseline of comparison for further analysis with other omics
methods and provides a comprehensive output of gene expression. DGE is
an indication of proteins that are translated and involved in cellular
signaling dynamics for the pathway of interest.