Genetic Structure
The DAPC analysis was completed retaining the first 19 principal components as determined by the optimum a-score, and 19 discriminant functions which explained 30.5% of the variance (Figure 2). The scatterplot shows a main cluster that consists of 23 of the populations including all from the Northeastern quadrant, most of the Upper Midwest quadrant, Stokes and Watauga Counties in North Carolina, and overlaps with Claiborne County, Tennessee. The Southern Gulf quadrant is the only quadrant that does not have populations in the main cluster. The populations in the Southeastern Atlantic quadrant that are closest to the coastline are the furthest from the main cluster with the southernmost population from Osceola County, Florida being the most distant.
These trends were further examined using R package LEA to assess population structure of all individuals, outside of the pre-assigned populations. The cross-entropy criterion determined K=5 is optimal (Supplemental Figure 2). The five clusters can be described as assorting into Northeast, Upper Midwest, Central, Southern, and Central Florida (Figures 3, 4A). Moderate levels of admixture were apparent for most populations. However, Southern populations had lower levels of admixture with individuals mostly being assigned cluster 2, 3, or 4 and low assignments to all other clusters (Figure 3). Every tick had membership probabilities calculated for each of the five clusters, and a main cluster was assigned to each individual by finding which cluster had the highest membership probability, then the DAPC analysis redone. This iteration showed a similar pattern to the original population based DAPC, with one group being noticeably larger and consisting mainly of individuals from the North (clusters 1 and 5). There is a small distant group that is composed of the ticks from Central Florida (cluster 3), and then a group that shares no overlap with the Northern group which is primarily composed of ticks from the Southern Gulf (Figure 4A).
In this DAPC analysis we identified the top six alleles across four scaffolds that were contributing to differentiating the clusters the most (NW_024609857.1_scaffold2 position 192156357; NW024609868.1_scaffold3 position 60000181; NW024609879.1_scaffold4 positions 31884195, 31884182, 66253319, 66253456; and NW_024610117.1 position 26212) (Supplemental Figure 4). Some of these positions fall within annotated genes in the recent I. scapularis genome (GenBank: GCA_016920785.2 ASM1692078v2), specifically gene loci LOC8029585, LOC120844762, and LOC121834473. When examining the allele representation of all clusters at these sites, cluster 3 (Central Florida) always has the opposite allele frequency to all other clusters (Supplemental Figure 4).
While morphological identifications confirmed the Central Florida ticks as I. scapularis , Florida has populations of I. affinisticks that can be morphologically similar to I. scapularis . Thus, we confirmed the species identity of the ticks collected from Central Florida using 16S rRNA, finding that they are most similar to otherI. scapularis individuals from the Southern clade (Figure 4C). However, to determine the alleles contributing to the differentiation of the other four clusters, ticks within the Central Florida cluster were removed from the dataset and the analysis rerun. In doing so, our DAPC shows more differentiation among the remaining four clusters, especially noting that ticks from the Upper Midwest and Northeast no longer completely overlapped (Figure 5B). We examined the ten SNPs that contributed the most to the differentiation of these 4 groups and found these were all on scaffold 3 (NW024609868.1 positions 42246251,42246253, 50953023, 72034653, 72034688, 72034691, 72034789, 72034790, 72034791, and 72034815; more information can be found in Supplemental Table 6). There are two other groups of SNPs, comprised of loci that are noticeably contributing to the differentiation of the four geographic clusters, located on scaffolds 4 (NW024609879.1) and 8 (NW_024609883.1) (Supplemental Figures 5-6).
The isolation by distance analysis was completed using the original 33 populations designated for the ticks. We do not see any strong correlation between genetic and geographic distance of our populations when all comparisons are included. However, Northern-to-Northern comparisons have less genetic distance between populations even as geographic distance increases, whereas the Southern-to-Southern comparisons have a varied correlation of genetic to geographic distance. The Northern-to-Southern comparisons are distinctly higher in genetic distance across all geographical distances (Figure 5).