Phenotypic Trajectory Analyses (PTA) were performed on the procrustes
scores using the trajectory.analysis function in geomorphto look at the extent and direction of phenotypic change between
ecomorphs. Magnitude of divergence is described by the length of
trajectories (L) while the angle between trajectories (θ) describes
their direction in phenotypic space. This approach allows us to
determine how parallel the trajectories of each ecomorph pair are to one
another by using the difference in phenotypic trajectory length
(ΔLP) and the direction of phenotypic trajectories
(θP). The significance of differences in trajectory
lengths and differences in trajectory direction were assessed using
1,000 permutations . Phenotypic divergence between ecomorphs in
different lakes was considered to be parallel if the direction and/or
magnitude of phenotype change did not differ significantly between the
pairs (P < 0.05).
Population genomics:
Filtered 75bp reads for each individual, generated via ddRADseq from
Jacobs et al . (2020) and accessed from (ddRADseq NCBI short read
bioproject: PRJNA607173), were mapped using bwa mem andSAMtools using settings described in that paper to theSalvelinus sp. genome from NCBI (ASM291031v2). The number
of reads per individual ranged from 1 to 3.5 million. RAD loci were
built in the gstacks module of Stacks v2.53 for 200
individuals (Awe_Bn= 26, Awe_Pl=29, Dughaill_Bn=28, Dughaill_Pl=27,
naSealga_Bn=18, naSealga_Pl=20, Tay_Bn=21, Tay_Pl=31). SNPs were
retained in the populations module of Stacks if they met
the following criteria: present in 66% of all individuals in each
population and across all populations, a minimum minor allele frequency
of 0.05, maximum observed heterozygosity of 0.5. Each ecomorph within a
lake was considered to be a discrete population. The script
filter_hwe_by_pop.pl to filter out sites outside Hardy-Weinberg
equilibrium within populations (available at
https://github.com/jpuritz/dDocent). vcftools v0.1.13 was
used to filter to a minimum coverage of 5x and a maximum of 50x. A
principal component analysis was performed to identify the major axes of
genetic variation using SNPRelate v1.22.0 (R package) .
Genotype-phenotype association analyses:
To determine the association between genotypes and phenotypic variation
in head or body shape, we ran Linear Mixed Models (LMM) in Gemma
v0.98.1 . Univariate and Multivariate LMMs with Wald’s test were run
using PCs 1-5 for head shape and body shape, the SNP dataset generated
for the population genomics analyses, and lake of origin as a co-variate
. Missing genotypes were imputed using LinkImpute v1.1.4 and a
relatedness matrix was generated using Gemma before running the
models. We determined significant associations using
bonferroni-corrected P-values (0.05/7329 unlinked SNPs) from the Wald’s
tests. The number of unlinked SNPs was determined by LD-pruning the full
SNP dataset using the snpgdsLDpruning function in SNPRelate .
Bayesian Sparse Linear Mixed Models (BSLMM) were run using PC1-5
variables to determine how much of the phenotypic variation is explained
by the SNPs in our dataset (PVE) and secondly how much of that variation
is explained by large-effect loci (PGE), and finally how polygenic each
phenotype is (⍴). Manhattan plots were made using CMplot v4.0 (R
package) (https://github.com/YinLiLin/CMplot).
We subsequently determined if SNPs showing significant associations with
head shape or body shape morphology were found within annotated genes in
the Salvelinus sp. reference genome using BEDtools
v2.27.1 . The functions of genes containing, or ± 1kbp of, associated
SNPs were investigated using GO term overrepresentation analysis (ORA)
and gene set enrichment analysis (GSEA). These analyses were run usingtopGO v2.40.0 (R package) with all genes containing any RAD loci
as the full comparison dataset. Results were summarised usingREVIGO before visualisation in Cytoscape v3.91 .
Comparisons to known QTLs:
Using existing information on the genetics of important phenotypes from
other salmonid species, we mapped a database of 1,338 QTL markers to theSalvelinus sp. genome. This was based on a previously
published database of QTLs involved in traits related to morphology and
life history, derived from a range of salmonid species and previously
mapped to the Salmo salar genome . Additionally, a literature
search was conducted up to April 2021 to augment the existing database
with more recently published QTLs. This literature search was conducted
in Web of Science and Google Scholar using the search terms “QTL”,
“quantitative trait loci”, “salmonid”, and the common and scientific
names for rainbow trout, Atlantic salmon, Arctic charr, lake whitefish,
Chinook salmon, coho salmon, brook trout, and lake trout. QTL marker
sequences were gathered for 17 different phenotypes: body length, body
shape, body weight, Fulton’s condition factor, directional change,
disease resistance, embryonic development, gill rakers, growth rate,
hatching time, head shape, parasite resistance, salinity tolerance,
sexual maturation, smolting, spawning time, upper temperature tolerance
(Table S1).
Following Jacobs et al. (2017), the strategy of mapping the
QTL-linked markers to the Salvelinus sp. genome depended on the
QTL marker type: RAD loci were mapped using Bowtie2 v2.4.4 and
the very sensitive pre-set; microsatellite primer sequences,
which are shorter, were mapped using Bowtie v1.3.1 allowing for 3
mismatches. QTLs for which the flanking markers mapped to different
chromosomes were removed. Redundant QTLs, i.e., where two QTLs for the
same trait from the same species mapped to the same location, were
removed only keeping the QTL with the higher PVE or LOD score (following
. For QTLs where more than one marker was reported, we attempted to map
all markers. Position values for the QTLs markers were then compared to
positions of the phenotype associated SNPs using BEDtools , with a
cut-off of ±100kbp. This value was used so that we could consider SNPs
within the range to be proximal to a QTL peak while also accounting for
the large size of many of the QTLs in the database. In total, we
successfully mapped 669 QTL-linked sets of markers to theSalvelinus sp. genome after removing redundant QTLs (Table S2).
Genomic response to selection:
We investigated if the phenotype-associated SNPs identified in our
analyses showed signals of a response to selection and if those signals
were replicated across ecomorph pairs. To test this, for each ecomorph
pair we compared FST and DXY values for
phenotype-associated SNPs to a random background subset of SNPs. This
random subset was 100 SNPs randomly selected from the whole dataset and
the mean FST and DXY values for those
SNPs were calculated. This was repeated 10,000 times and the means for
FST and DXY were taken across all
permutations. These permuted values were then compared to the empirical
mean FST and DXY values for the
phenotype-associated SNPs using the t.test function in R.
Analyses of recombination rate
variation:
To test the effect of the recombination landscape on phenotype-genotype
association, we first estimated recombination rates using the published
Arctic charr linkage map (N=3,636) using MareyMap v1.3.6 . RAD
loci from the linkage map were aligned to the Salvelinus sp.
reference genome with Bowtie2 using the -very-sensitive setting.
Loci were kept if they uniquely mapped to one location, mapped to the
same chromosome as all other loci on their linkage group, and followed
the orientation of the linkage map (i.e., not reversed). The filtered
dataset was used to estimate the recombination rate across each
chromosome using a spline algorithm. Spar values were varied for each
chromosome from 0.5 to 0.9, depending on chromosome size, to best fit
the data . Subsequently, WindowScanR v0.1 (available at:
https://github.com/tavareshugo/WindowScanR) was used to summarize
recombination rate values in 1 MB windows along the genome. All SNPs
were assigned to these windows using BEDtools . A random subset of
100 SNPs was then selected and the mean recombination rate for those
SNPs was calculated based on their windows. This was repeated 10,000
times to generate a background mean recombination rate, which was then
compared to the mean recombination rate of the phenotype-associated SNPs
(based on their windows) using a t-test.