javascript:void(0)Read preprocessing and DEG analyses
Quality of the reads were checked using fastqc (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Adapters were clipped and low quality bases (phred score < 20) were removed from the reads in 4 bp sliding windows using Trimmomatic (Bolger, Lohse, & Usadel, 2014). Genome assembly and annotation ofP. australis was obtained from Clean reads were mapped to the reference genome using STAR aligner (Dobin et al., 2013). After sorting using Samtools (Danecek et al., 2021), the mapped reads were processed using StringTie (Pertea et al., 2015) to obtain the gene and transcript count matrix. Principal component analysis (PCA) was conducted on differentially expressed genes among the treatment groups using DEseq2 following rlog transformation of gene counts, which represent the number of reads mapped to the reference genome (Love, Huber, & Anders, 2014).
DESeq2 provides statistical routines for determining differential expression in digital gene expression data using a model based on the negative binomial distribution. The resulting P values were adjusted using the Benjamini and Hochberg’s approach for controlling the false discovery rate. Genes with an adjusted P-value < 0.01 & Fold Change≥2 found by DESeq2 were assigned as differentially expressed. Gene Ontology (GO) enrichment assessment analyses sets of genes in the context of GO categories to determine whether certain functional categories are overrepresented or enriched within the background whole genome gene sets. Here we tested enriched GO terms for the differentially expressed genes (DEGs) using goatools (Klopfenstein et al., 2018), using a criterion of P <0.05 with Bonferroni correction. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis was performed using KAAS (Moriya, Itoh, Okuda, Yoshizawa, & Kanehisa, 2007) and the enriched pathway of differentially expressed genes were analyzed using clusterProfiler (Wu et al., 2021). The Venn diagram was performed using ggvenn (Yan & Yan, 2021).
Coefficient of variation (CVa) for gene expression data was calculated by first normalizing the raw read counts using the variance stabilizing transformation (VST) method, which stabilizing variance across expression levels. Samples were then grouped by population and treatment. The CVa, defined as the ratio of the standard deviation to the mean expression level, was computed for each gene, with genes exhibiting zero variance excluded from further analysis. Finally, we visualized the distribution of CVa values using density plots, highlighting peaks and medians.