Sequence data processing
Quality and adapter trimming were carried out on raw reads using
Trimmomatic (Bolger et al., 2014), discarding reads < 50 bp or
with an averaged quality score < 20 in a sliding window of
five bases. Since the coral holobiont is associated with high densities
of prokaryotes and algal endosymbionts, reads were filtered with the
following steps: First, reads were compared to an rRNA database
(Silva132_LSU, Silva132_SSU) and matches (i.e., e-values ≤
10-5) were removed using the program SortMeRNA
(Kopylova et al., 2012). Second, reads were compared to the algal
endosymbiont genome (genus Cladocopium, symC_scaffold_40.fasta
(Shoguchi et al., 2018) and matches were removed using BBDuk (Bushnell,
2020). The remaining reads of each sample are shown in Table S2 and were
used to create a de novo assembly for the each offspring groups
and a combined de novo assembly for all four offspring groups
using Trinity (Grabherr et al., 2011). Small transcripts of <
400 bp were removed from the assemblies (Kenkel & Bay, 2017), and the
longest isoform of each trinity transcript was obtained. Mitochondrial
genes were identified running BLASTn against the A. tenuismitochondrial genome (NC_003522.1.fasta, van Oppen et al., 2002) and
were retained in the analysis. The remaining transcripts were then
identified by BLASTx searches against the most complete coral gene model
(A. digitifera , GCF_000222465.1_Adig_1.1_protein.faa,
Shinzato et al., 2011) and NCBI’s nonredundant (nr) protein database,
with a e-value cut off ≤ 10-5.
Gene names and gene ontologies (GO) of the transcripts were assigned
using BLASTx search against UniProt Knowledgebase Swiss-Prot database
(The UniProt Consortium, 2015). Duplicate query transcripts were
removed. Transcript abundance of the samples was then estimated using
RSEM, an alignment-based method (Li & Dewey, 2011). Transcript
quantification of the samples was performed by aligning reads using
bowtie2 (Langmead & Salzberg, 2012) and estimating abundance with RSEM
(Li & Dewey, 2011). For gene expression comparison between hybrids and
parental purebreds, we tested estimating transcript abundance using the
assembly of purebred A. loripes , as well as the combined assembly
produced using all offspring groups. The two methods revealed very
similar results (Figure S1), and the results presented here are based on
transcript abundance estimated using the assembly of purebred A.
loripes . Due to the small number of samples available for the parental
purebred A. tenuis (Table S1), a de novo assembly was not
conducted or tested as a basis for transcript abundance estimate. For
gene expression comparison between treatments within an offspring group,
the de novo assembly of each offspring group was used to estimate
transcript abundance. Treatment comparison was not conducted forA. tenuis purebreds due to an insufficient number of samples
(Table S1).