Sequence data processing
Quality and adapter trimming were carried out on raw reads using Trimmomatic (Bolger et al., 2014), discarding reads < 50 bp or with an averaged quality score < 20 in a sliding window of five bases. Since the coral holobiont is associated with high densities of prokaryotes and algal endosymbionts, reads were filtered with the following steps: First, reads were compared to an rRNA database (Silva132_LSU, Silva132_SSU) and matches (i.e., e-values ≤ 10-5) were removed using the program SortMeRNA (Kopylova et al., 2012). Second, reads were compared to the algal endosymbiont genome (genus Cladocopium, symC_scaffold_40.fasta (Shoguchi et al., 2018) and matches were removed using BBDuk (Bushnell, 2020). The remaining reads of each sample are shown in Table S2 and were used to create a de novo assembly for the each offspring groups and a combined de novo assembly for all four offspring groups using Trinity (Grabherr et al., 2011). Small transcripts of < 400 bp were removed from the assemblies (Kenkel & Bay, 2017), and the longest isoform of each trinity transcript was obtained. Mitochondrial genes were identified running BLASTn against the A. tenuismitochondrial genome (NC_003522.1.fasta, van Oppen et al., 2002) and were retained in the analysis. The remaining transcripts were then identified by BLASTx searches against the most complete coral gene model (A. digitifera , GCF_000222465.1_Adig_1.1_protein.faa, Shinzato et al., 2011) and NCBI’s nonredundant (nr) protein database, with a e-value cut off ≤ 10-5.
Gene names and gene ontologies (GO) of the transcripts were assigned using BLASTx search against UniProt Knowledgebase Swiss-Prot database (The UniProt Consortium, 2015). Duplicate query transcripts were removed. Transcript abundance of the samples was then estimated using RSEM, an alignment-based method (Li & Dewey, 2011). Transcript quantification of the samples was performed by aligning reads using bowtie2 (Langmead & Salzberg, 2012) and estimating abundance with RSEM (Li & Dewey, 2011). For gene expression comparison between hybrids and parental purebreds, we tested estimating transcript abundance using the assembly of purebred A. loripes , as well as the combined assembly produced using all offspring groups. The two methods revealed very similar results (Figure S1), and the results presented here are based on transcript abundance estimated using the assembly of purebred A. loripes . Due to the small number of samples available for the parental purebred A. tenuis (Table S1), a de novo assembly was not conducted or tested as a basis for transcript abundance estimate. For gene expression comparison between treatments within an offspring group, the de novo assembly of each offspring group was used to estimate transcript abundance. Treatment comparison was not conducted forA. tenuis purebreds due to an insufficient number of samples (Table S1).