Locus Identification and Filtering
Sequences were demultiplexed by NovoGene Co. using index sequences at
the standard Illumina indexing positions (from the iTru5 and iTru7
primers), then cleaned and analyzed by us using Stacks v2.5 (Catchen et
al., 2013). To clean the sequences, we used process_radtags, providing
the restriction site and internal tags along with the parameters (-c)
removing any read with an uncalled base, (-q) trimming low-quality bases
using the default setting of a sliding window and a raw phred score of
10, and (-t 140) truncating reads to 140 base pairs. Cleaned sequences
were aligned to the I. scapularis genome (GenBank:
GCA_016920785.2 ASM1692078v2) using BWA-MEM v0.7.17 (Li & Durbin,
2009).
Alignments were filtered using Samtools v1.10 (Danecek et al., 2021) by
keeping only uniquely mapping reads with qualities over 25 and removing
unmapped reads or reads containing 5 or more variants per read. We then
assembled mapped RAD loci stacks using the ref_map.pl program in Stacks
and called the single nucleotide polymorphisms (SNPs) from these groups
of mapped loci. To be included in the output, loci needed to be present
in at least 60% of individuals. The SNP output from Stacks was exported
into a variant call format (VCF) file.
VCFtools v0.1.16 (Danecek et al., 2011) was used to further filter our
data to have a minor allele frequency of ≥5%. We removed loci that were
missing from more than 20% of all individuals, and loci that had
coverage below 6x or above 200x. All libraries were analyzed for quality
and missing data, if a library did not have more than 50% of the
high-quality loci present it was removed from the analysis due to poor
library quality. Only 19 samples were removed, and the Stacks analysis
pipeline and filtering above was re-run on the final 353
analysis-quality libraries. After all filtering steps the final dataset
consists of 7,274 polymorphic loci. All analyses presented here are
based on these loci.