2.3 Transcriptome assembly
The raw reads from Illumina were initially quality assessed using MultiQC software (Andrews, 2010; Ewels et al., 2016). The adapter contamination was removed using Trimmomatic tool specifically designed for Illumina NGS data (Bolger et al., 2014), followed by the removal of the residual rRNA reads by using sortMeRNA program (Kopylova et al., 2012). The quality checking by MultiQC included assessment of sequence quality score (phred >30), adapter content and position, GC content, and ambiguous bases (Ns). Only the clean filtered reads were used in our downstream analysis. A robust transcriptome was constructed with Trinity v2.9.0 software pipeline (Grabherr et al., 2011) by developing a combined redundant-over assembly from de novo and genome-guided assembly using a bilberry genome sequence of the same bilberry ecotype (Wu et al., 2021, unpublished). The draft genome was indexed and align-mapped to the reads using STAR v2.6.1d software (Dobin et al., 2013). The genome-guided Trinity output was concatenated withde novo transcriptome to form a combined assembly. EvidentialGene tool (Gilbert, 2019) was used to remove the redundancy arising from assemblies. The reads were further mapped to the published highbush blueberry (V. corymbosum cv. Draper v1.0) genome (Colle et al., 2019) using HISAT2 software to improve the annotation of assembly. The best possible coding regions were identified using TransDecoder tool (http://transdecoder.github.io), which identifies a minimal length of open reading frames (ORFs) within reconstructed Trinity transcripts. To assess the completeness of the transcriptome assemblies, BUSCO tool v3.0 (Simão et al., 2015) was used to validate the single copy genes on an evolutionary perspective. Embryophyta orthologous database odb_v.10 (https://busco-archive.ezlab.org/v3/) was used to validate the assembled transcriptomes.