Genome-wide association study – exploring correlations between allelic frequencies and colour morphs
Because population stratification is known to be a confounding factor in genome-wide association studies, we tested for the existence of population clusters in our dataset with a) PCA performed in SNPrelate and a discriminant analyses of principal components (DAPC) in R’s package adegenet (Jombart, 2008). We searched for the likely number of clusters with the function find.clusters and chose the likely number of K according to Bayesian Information Criteria (BIC). The two main linear discriminants, i.e., explaining most the largest amount of variation due to putative population structure, were used as covariates in downstream genome-wide association analyses (GWAS). Three approaches were employed to detect associations between colour (binary phenotype either grey or brown ) and underlying genetic variation among RADtags. The p-values to consider associations significant or suggestive were defined by correcting for multiple comparisons according to the number of utilized SNPs - at the end of filtering steps - following Guo (2017) (Guo et al., 2017). Specifically, the thresholds (log(p) ) defined for suggestive and significant associations corrected for the number of input loci ranged between 3.65 and 4.95 for the 4755 loci utilized in PLINK’s GLM after filtering and between 4.10 and 5.40 for the 12673 loci utilized in GEMMA after filtering. First, we utilized Plink2’s (Chang et al., 2015) generalized linear model logistic-Firth hybrid regression while after applying the following filters to the SNP dataset: Hardy-Weinberg Equilibrium = p-value < 1E-6; minor allelic frequency = 0.05; SNP presence = 80%; and r2 = 0.1 for pairwise linkage-disequilibrium. We then utilized GEMMA (Zhou & Stephens, 2012) with the following filters: SNP presence = 5%, minor allelic frequency = 0.05, Hardy-Weinberg Equilibrium = 0.001. Family structure and/or pedigree was considered by calculating and incorporating a relationship matrix with centered genotypic variance. In GEMMA, associations between SNPs and covariates were inspected with a univariate linear mixed model with one intercept and two covariates corresponding to linear discriminants obtained with DAPC. Lastly, we also used the R package wtest to further explore allele-allele interactions (Sun et al., 2019). Wtest is based on its namesake, the W-test (Wang et al., 2016), developed to post-hoc explore epistatic interactions.