Figure 4. Fruit color differences within the homozygous VIRIM sample group (yellow fresh fruit color) when categorized by the genotypes of SNP LG3s906369. Fruits were grouped together which are homozygous for the (a) ALT or (b) REF allele of SNP LG3s906369. Results shows that the lightness of the color increases (R/B value increases) when the sample is homozygous ALT allele of SNP LG3s906369.
Candidate genes and SNPs annotation.
We considered potential candidate genes when they were within ±150 kb of a significantly associated SNP (total of 300 kb). While this is broad given average LD decay, we wanted to ensure the possibility of capturing linked genes that may be outside the standard LD decay size. There were 117 genes present across all the potential regions of significant SNPs from the GWAS result of the R/B color phenotype (Supplementary file 3). The putative gene functions were assigned by similarity using blast2Go software. The blast2Go and literature search results show that many genes related to fruit ripening and pigmentation are present around the region of significant SNPs (Table 3) and many are present within a 50 kb of the significant SNPs. The gene expression analysis using the RNA-seq data for Kenezi (dark brown color) and Khalas (light brown color) fruit varieties shows that many genes from the potential candidate regions were expressed in fruit during the various days post-pollination (dpp) (Figure 5). R2R3 transcription factor gene from LG4 was expressed late in the development stage in both dark and light brown color fruit cultivar (dpp 105, 120 and 135). The R2R3 transcription factor gene has reduced expression in light color fruit variety compared with dark. RING/U-box superfamily protein from LG10 was expressed at dpp 105 to 120 in the light fruit variety ( peaking at dpp 120) compared to the dark color fruit. Other genes such as Protochlorophyllide reductase (LG3), Basic helix-loop-helix (BHLH) DNA-binding superfamily (LG3), were expressed in early in the development stage (dpp 45 and 75) and reduced expression in late in the development stage (dpp 105, 120 and 135) in both light and dark brown color fruit varieties.
We conducted structural variation analysis on all potential regions from all LGs. The analysis showed a 5 kb deletion in the candidate region of SNP LG10s12886617 on LG 10 (Supplementary Figure 12). While this 5 kb deletion is located 800 bases away from the pentatricopeptide repeat-containing protein gene and 7 kb away from the SNP LG10s12886617, association of its presence/absence to the GWAS SNP was not high enough to warrant further investigation. The SNPs and INDELs from all potential regions were annotated using SNPEff software and filtered based on LD R2 value >=0.6 and putative impacts value High, Moderate and Modifier. The SNP’s filter results show that a total of 34 SNPs are present across the potential region, 12 non-synonymous variants, one frameshift variants, and 21 three-prime and five prime UTR variants (Supplementary file 4). The SNP list has only one SNP (LG10s12771512) from the gene list mentioned in Table 2. SNP LG10s12771512 is within the Ethylene-responsive transcription factor12 gene from LG10. SIFT results of amino acid substitution effects on protein function analysis showed that the SNP is putatively deleterious possibly affecting protein function.
Table 3: List of genes detected around the regions of significant SNPs from GWAS result association with the R/B fruit color phenotype. Genes were selected if they have a putatively significant role in fruit ripening and pigmentation. ±150 kb on both sides of significant SNP (total 300 kb) region were considered as a potential region for identifying the possible candidate gene.