Application 2: Barcoded individuals in population samples of
steelhead
Hatcheries have an important but controversial role in supplementing
dwindling fish stocks in the Columbia River basin (Busby, Wainwright, &
Bryant, 1999), including, in a few cases, selection for particular
traits in hatchery stocks that differ from the stocks into which they
are outplanted or stray (disperse to non-natal areas). One of the most
abundant and widely outplanted hatchery stocks of steelhead trout in the
Columbia Basin comes from Skamania Hatchery (Washougal, WA). The
Skamania stock has a long history of deliberate selection for earlier
spawning and larger fish (Ayerst, 1976), which has resulted in the
evolution of fish that migrate notably earlier than conspecifics and
almost exclusively after two or more years ocean duration (Hess et al.,
2021). Without choosing individuals with known phenotypes, but rather
undirectedly sampling individuals from the Skamania hatchery stock as
well as individuals from two nearby natural origin stocks (Lewis River
and Eagle Creek-Willamette River) in the same steelhead lineage
(Coastal), we tested if genomic regions previously associated with these
traits or others would appear strongly differentiated in the Skamania
stock.
Library preparation followed the individual barcoding protocol from Horn
et al. (2020) and sequencing was done separately for each population on
the Illumina NextSeq 550 with 150-bp paired-end
reads.
The number of individuals per pool ranged from 60 to 78. Data were
processed with PoolParty2, including discarding of reads if
trimmed below 50bp from sliding windows with a minimum mean PHRED
quality of 20, and filtering SNPs if they were below a PHRED quality of
20, three or fewer bases from an insertion-deletion position, observed
in fewer than 10 reads in each sample pool or more than 1,500 globally,
if the number of individuals surveyed per population was fewer than
three of if the global minor allele frequency was less than 0.005. The
allele frequency data were normalized in PPalign to mediate
non-uniform read contribution among individuals. Using the
PPstats module, we assessed data coverage distributions,
proportion of the genome covered at specified depths, and evenness of
coverage across chromosomes. Normalized allele frequencies were filtered
and analyzed with PPanalyze including calculation of
FST, sliding window FST (100Kbp windows
in 5Kbp steps), and Fisher’s Exact test (FET). Significance values from
the Exact tests were used in local score analyses, using three replicate
runs with ξ representing the 80th,
90th, 95th, and
99th quantiles of significance values (the
70th quantile did not produce a mean local score
distribution below zero). Filtered read alignment files (BAMs) created
by PPalign were used as input for angsd, which was
directed to consider the variants filtered by PPanalyze, and
from which we utilized the genotype likelihoods provided by
angsd as input to estimate linkage with ngsLD for
three chromosomes with the most significant and consistent outlier
regions in the Local Score results, considering only sites ≤ 100Kbp from
one another. As above, we calculated mean LD in 100Kbp windows in 5Kbp
steps in R, but identified outlier regions as contiguous series of ≥20
windows exceeding 2x the interquartile range (2xIQR) for mean windowed
LD. When multiple contiguous outlier window series were present in the
range identified by the lowest Local Score quantile, we report all those
series.