Bioinformatics
We trimmed the sequence data to remove potential PCR artifacts using the
program TrimGalore version 0.6.5
(https://github.com/FelixKrueger/TrimGalore), a wrapper for
Cutadapt . We used the Burrows-Wheeler Aligner software version 0.7.17
to map reads to a reference genome from the closely related Yellow
Warbler (Setophaga petechia ; Bay et al. 2018). After mapping, the
resulting SAM files were sorted, converted to BAM files, and indexed
using Samtools version 1.9 . We marked read duplicates with
MarkDuplicates from GATK version 4.1.4.0 and clipped overlapping reads
with the clipOverlap function from bamUtil
(https://genome.sph.umich.edu/wiki/BamUtil:_clipOverlap).
Sequencing depth for individuals was calculated using Samtools. Initial
population genetics analyses revealed a large effect in the data due to
high variation in sequencing depth among individuals. To reduce
sequencing depth variation, we followed the recommendations of and used
the DownsampleSam function from GATK to randomly down sample reads from
BAM files with greater than 2X coverage, to 2X coverage.
To identify genetic markers from low-coverage WGS data, we used
stringent filtering options in ANGSD version 0.9.40 (). We retained
reads with a mapping quality of at least 30 and base quality of at least
33. SNPs were identified based on a p-value of less than 1e-6. We
retained SNPs that had read data in at least 50% of individuals
(n = 165), a minor allele frequency greater than 0.05, and
minimum and maximum total depths of 231 and 924, respectively. The
minimum total depth threshold was chosen by the minimum number of
individuals required to call a variant (n = 165) multiplied by
the mean sequencing depth of all individuals (1.4X). The maximum total
depth threshold was determined by 2 * total number of individuals * mean
sequencing depth. The filtered variants were output as genotype
likelihoods and used in subsequent analyses.