Differential gene expression analysis
NGS reads were assessed for quality control metrics after adapter trimming with Trim Galore (Krueger, F., 2019). We then used Kallisto (Bray, N. L. et al., 2016) to quantify transcript abundance by pseudo-alignment on the latest release of the Apis melliferagenome assembly Amel_HAv3.1 (Wallberg, A. et al., 2019), compiled the transcript abundance estimates for each sample into an expression matrix in R (R Core Team, 2019), and summed transcripts by gene using the tximport method, see Table S2 (Soneson, C. et al., 2015). Low count genes (< 1 count across > 25% of samples) were removed from the analysis, resulting in 92.38% (9178/9935) of theA. mellifera protein-coding genes (Wallberg, A. et al., 2019) represented in this study.
We employed a two-factor design using DESeq2 (Love, M. I. et al., 2014) to assess differential expression of protein-coding genes (DEGs) within tissue types between nurse, forager, and winter bees, using colony as a cofactor. In each analysis, we applied the default Benjamini-Hochberg (Benjamini, Y. & Hochberg, Y., 1995) correction and set the threshold for differential expression at padj < 0.01.
To assess sample relationships and variance, hierarchical clustering analysis was conducted using the variance stabilized transformed (VST) counts generated by DESeq2. We determined that two of the nurse fat body samples (N1 and N3) were outliers from this analysis, as each created separate branches independent of the other samples that clustered as expected by phenotype (Figure S1). We identified 411 genes that were differentially expressed between the clustered and outlier nurse samples (padj < 0.05) but did not observe functional enrichment for any biological processes among these genes. FastQC did not report major differences in any of the default sequence quality metrics apart from reads corresponding to highly overrepresented sequences. A local BLASTN (McGinnis, S. & Madden, T. L., 2004) search of the overrepresented sequences revealed many hits with 100% identity and 100% query cover for A. mellifera 16S ribosomal RNAs, a common source of contamination during library preparation (Zhao, W. et al., 2014). However, upon filtering our sequencing reads for honey bee rRNAs (The RNAcentral Consortium, 2018), sample clustering did not improve (Figure S2). Thus, we could not determine the true source of this discrepancy, and so removed these samples from all downstream analyses.
\(\chi^{2}\) tests and k-means clustering
We performed chi-squared tests (with Yate’s continuity correction) to determine, for each tissue, whether the number of DEGs between winter bees and nurses is significantly different than the number of DEGs between winter bees and foragers. Additionally, we performedk -means clustering (Oyelade, J. et al., 2016) to assess whether the tissue-specific transcriptomes of winter bees had more in common with foragers or nurses using the fviz_cluster function from the factoextra (Kassambara, A., 2019) package in R, testing for \(k=2\)clusters. Separate tests were performed using: 1) all genes passing our initial low-count filter (9178), and 2) only forager vs nurse DEGs.