3.3 Variant calling and filtering
Initial filtering steps produced several SNP datasets, including a ”hard
filter” dataset (SNPs_HF), a dataset excluding SNPs with missing data
(SNPs_HF_NMD), and a set excluding linked SNPs (SNPs_HF_NMD_NLD)
(Table 1). Removing the low-recovery sample Z4 from the analysis altered
the STRUCTURE results, reducing the optimal number of clusters fromK =5 (with Z4) to K =2 (without Z4; (all comparative
analysis results for this section can be found in Appendix S1). The
population structure position of Z4 diverged from its native population.
Consequently, we opted to discard Z4 in subsequent analyses.
Despite the high SNP count from supercontigs (34,138 SNPs after linkage
disequilibrium filtering), no significant difference in population
genetics parameters (F ST,H E and F IS) was observed
between datasets derived from supercontigs and introns (Table 1,
Appendix S1). Therefore, intron-derived SNPs were selected for final
analyses to ensure robustness and compatibility across related studies.