Locus Identification and Filtering
Sequences were demultiplexed by NovoGene Co. using index sequences at the standard Illumina indexing positions (from the iTru5 and iTru7 primers), then cleaned and analyzed by us using Stacks v2.5 (Catchen et al., 2013). To clean the sequences, we used process_radtags, providing the restriction site and internal tags along with the parameters (-c) removing any read with an uncalled base, (-q) trimming low-quality bases using the default setting of a sliding window and a raw phred score of 10, and (-t 140) truncating reads to 140 base pairs. Cleaned sequences were aligned to the I. scapularis genome (GenBank: GCA_016920785.2 ASM1692078v2) using BWA-MEM v0.7.17 (Li & Durbin, 2009).
Alignments were filtered using Samtools v1.10 (Danecek et al., 2021) by keeping only uniquely mapping reads with qualities over 25 and removing unmapped reads or reads containing 5 or more variants per read. We then assembled mapped RAD loci stacks using the ref_map.pl program in Stacks and called the single nucleotide polymorphisms (SNPs) from these groups of mapped loci. To be included in the output, loci needed to be present in at least 60% of individuals. The SNP output from Stacks was exported into a variant call format (VCF) file.
VCFtools v0.1.16 (Danecek et al., 2011) was used to further filter our data to have a minor allele frequency of ≥5%. We removed loci that were missing from more than 20% of all individuals, and loci that had coverage below 6x or above 200x. All libraries were analyzed for quality and missing data, if a library did not have more than 50% of the high-quality loci present it was removed from the analysis due to poor library quality. Only 19 samples were removed, and the Stacks analysis pipeline and filtering above was re-run on the final 353 analysis-quality libraries. After all filtering steps the final dataset consists of 7,274 polymorphic loci. All analyses presented here are based on these loci.