3.3 Variant calling and filtering
Initial filtering steps produced several SNP datasets, including a ”hard filter” dataset (SNPs_HF), a dataset excluding SNPs with missing data (SNPs_HF_NMD), and a set excluding linked SNPs (SNPs_HF_NMD_NLD) (Table 1). Removing the low-recovery sample Z4 from the analysis altered the STRUCTURE results, reducing the optimal number of clusters fromK =5 (with Z4) to K =2 (without Z4; (all comparative analysis results for this section can be found in Appendix S1). The population structure position of Z4 diverged from its native population. Consequently, we opted to discard Z4 in subsequent analyses.
Despite the high SNP count from supercontigs (34,138 SNPs after linkage disequilibrium filtering), no significant difference in population genetics parameters (F ST,H E and F IS) was observed between datasets derived from supercontigs and introns (Table 1, Appendix S1). Therefore, intron-derived SNPs were selected for final analyses to ensure robustness and compatibility across related studies.