Statistical analyses
Separate analyses were conducted to compare gene expression between hybrids and parental purebreds, and ambient versus elevated conditions within an offspring group. In addition, a separate analysis was carried out for mitochondrial genes. Transcript abundance of the samples and the BLAST results were analyzed in R and differential expression analysis was performed using the package limma (Ritchie et al., 2015). Firstly, only transcripts that were of coral origin were retained, as indicated in the BLAST results. For the mitochondrial analysis, only transcripts that matched with the mitochondrial genome were used. Secondly, transcripts that consistently had zero or very low counts were removed using the edgeR build in function filterByExpr, and scale normalization (TMM) was applied. For Principal Components Analysis (PCA), sample raw counts were transformed into log2-counts per million (log-CPM) to account for library size differences.
A total of four samples were identified to have small library size (three A. tenuis purebreds- two under ambient, one under elevated conditions, and one TL hybrid under elevated conditions), and a relative log expression (RLE) plot showed that normalization of these samples was unsuccessful (Gandolfo & Speed, 2018) (Figure S2, Table S2). These samples were excluded from the main manuscript, but their analyses were retained in the Supplemental Information. A heatmap was then used to visualize the 500 most variable genes across samples using the log-CPM expression values with dendrograms computed using Euclidean distances. For the mitochondrial analysis, a PCA and a heatmap were generated using all genes that remained post-filtering.
To fit linear models for comparisons, count data was transformed to log-CPM using the voom function in the limma package. Since no treatment effect was found on gene expression (see Results section), the comparison of hybrids and purebreds combined samples from both treatments. Comparisons were made between: 1) maternal purebred LL and its hybrid LT, 2) paternal purebred LL and its hybrid TL, and 3) between the reciprocal hybrids LT and TL. The purebred TT (A. tenuis ) was not included due to a small sample size (n =1, Table S1). Empirical Bayes moderated t-statistics were generated to assess the pairwise comparisons, and p -values were corrected using the Benjamini-Hochberg method (Benjamini & Hochberg, 1995). A gene was considered differentially expressed when padj< 0.05 using the treat function in the limma package with a log-fold-change threshold of > 0.2. The list of differentially expressed genes (DEGs) was exported for gene ontology (GO) analyses and visualized using volcano plots (Blighe et al., 2018). The volcano plots and GO analyses focused on the comparison of 1) paternal purebred LL with its hybrid TL, and 2) between the reciprocal hybrid LT and TL only, as these were the pairs with a high number of differentially expressed genes to explore.
Two different approaches were applied to the GO analyses, including GOseq (Young et al., 2010) and a rank-based GO analysis with adaptive clustering using a Mann-Whitney U (MWU) test (https://github.com/z0on/GO_MWU, Dixon et al., 2015). For GOseq, the analysis was conducted using the list of DEGs and the p- values were corrected with the Benjamini-Hochberg method (Benjamini & Hochberg, 1995). A GO category was considered overrepresented or underrepresented when the padj was < 0.05 and that the category had > 3 DEGs. For the MWU test, the hierarchical clustering trees utilized the log10-transformedp -values of the DEGs and indicated significantly enriched GO categories by up-regulated (red) or down-regulated (blue), under a false discovery rate of 10%. In addition, differentially expression nuclear genes in GO categories with functions connected to the mitochondrion were identified.