2.8 | Haplotype analysis in candidate genomic regions
The VCF files corresponding to each previously identified candidate region were extracted from the individual sequencing data using vcftools v.0.1.14 (Danecek et al., 2011) and converted into a FASTA file containing all individuals using a custom script. The software RAxML v.8.2.4 (Stamatakis, 2014) was used with a GTRGAMMA model (-m), rapid bootstrap method (-f), 630 seeds for the parsimony inference procedure (-p) to build maximum likelihood trees for each region and including the 63 individuals from location 1. Trees were visualized with the R package ggtree(Yu et al., 2017). The LD between haplotypes in pairs of candidate regions was tested with the R GenePop package (Rousset, 2008) and p-values were corrected for the number of tests by computing q-values. The R Poppr package (Kamvar et al., 2014) was used to estimate the index of association (\(\overline{r}\)D) between combinations of haplotypes in multiple candidate regions. Redundancy analysis (RDA, (Rao, 1964)) was performed with the R vegan package (Oksanen et al., 2012). We used a RDA to identify correlations between the haplotypes defined in the candidate regions (explanatory variables) and the quantitative traits (response variables), i.e., the DLA-R and the DLA-S traits. The correlations between candidate regions and traits were statistically tested with a permutation test with 1 000 permutations. The multilocus adaptive pattern between populations was then further investigated using pool-sequencing data. The software harp (Kessner et al., 2013) was used to estimate the frequency of the haplotypes defined from individual sequencing in the 14 study pools.