Statistical analyses
Separate analyses were conducted
to compare gene expression between hybrids and parental purebreds, and
ambient versus elevated conditions within an offspring group. In
addition, a separate analysis was carried out for mitochondrial genes.
Transcript abundance of the samples and the BLAST results were analyzed
in R and differential expression analysis was performed using the
package limma (Ritchie et al., 2015). Firstly, only transcripts
that were of coral origin were retained, as indicated in the BLAST
results. For the mitochondrial analysis, only transcripts that matched
with the mitochondrial genome were used. Secondly, transcripts that
consistently had zero or very low counts were removed using the edgeR
build in function filterByExpr, and scale normalization (TMM) was
applied. For Principal Components Analysis (PCA), sample raw counts were
transformed into log2-counts per million (log-CPM) to account for
library size differences.
A total of four samples were identified to have small library size
(three A. tenuis purebreds- two under ambient, one under elevated
conditions, and one TL hybrid under elevated conditions), and a relative
log expression (RLE) plot showed that normalization of these samples was
unsuccessful (Gandolfo & Speed, 2018) (Figure S2, Table S2). These
samples were excluded from the main manuscript, but their analyses were
retained in the Supplemental Information. A heatmap was then used to
visualize the 500 most variable genes across samples using the log-CPM
expression values with dendrograms computed using Euclidean distances.
For the mitochondrial analysis, a PCA and a heatmap were generated using
all genes that remained post-filtering.
To fit linear models for comparisons, count data was transformed to
log-CPM using the voom function in the limma package. Since no treatment
effect was found on gene expression (see Results section), the
comparison of hybrids and purebreds combined samples from both
treatments. Comparisons were made between: 1) maternal purebred LL and
its hybrid LT, 2) paternal purebred LL and its hybrid TL, and 3) between
the reciprocal hybrids LT and TL. The purebred TT (A. tenuis ) was
not included due to a small sample size (n =1, Table S1). Empirical
Bayes moderated t-statistics were generated to assess the pairwise
comparisons, and p -values were corrected using the
Benjamini-Hochberg method (Benjamini & Hochberg, 1995). A gene was
considered differentially expressed when padj< 0.05 using the treat function in the limma package
with a log-fold-change threshold of > 0.2. The list of
differentially expressed genes (DEGs) was exported for gene ontology
(GO) analyses and visualized using volcano plots (Blighe et al., 2018).
The volcano plots and GO analyses focused on the comparison of 1)
paternal purebred LL with its hybrid TL, and 2) between the reciprocal
hybrid LT and TL only, as these were the pairs with a high number of
differentially expressed genes to explore.
Two different approaches were applied to the GO analyses, including
GOseq (Young et al., 2010) and a rank-based GO analysis with adaptive
clustering using a Mann-Whitney U (MWU) test
(https://github.com/z0on/GO_MWU, Dixon et al., 2015). For GOseq, the
analysis was conducted using the list of DEGs and the p- values
were corrected with the Benjamini-Hochberg method (Benjamini &
Hochberg, 1995). A GO category was considered overrepresented or
underrepresented when the padj was < 0.05 and
that the category had > 3 DEGs. For the MWU test, the
hierarchical clustering trees utilized the log10-transformedp -values of the DEGs and indicated significantly enriched GO
categories by up-regulated (red) or down-regulated (blue), under a false
discovery rate of 10%. In addition, differentially expression nuclear
genes in GO categories with functions connected to the mitochondrion
were identified.