2.6 Variant calling and filtering
To compare the population genetic results between the threatenedA. cantabrica and the non-threatened A. halleri , we
performed variant calling and population genetic analyses using 35A. cantabrica samples from six populations and six samples ofA. halleri subsp. nuria from a single population. We
followed the pipelines and scripts provided by
https://github.com/lindsawi/HybSeq-SNP-ExtractionSlimp et
al. (2021, available at ) with some modifications. In their pipeline,
Slimp et al. (2021) used supercontig sequences, demonstrating
that most genetic variation occurred in flanking non-coding regions,
which tend to accumulate mutations quickly due to limited functional
constraints (Palumbi, 1996). We used sequences from supercontig and
intron regions separately for comparative analyses. We prepared a
reference file for supercontigs and introns using the same approach
described above to generate the nQuire reference, in this case,
selecting each gene’s longest supercontig and intron sequence.
Additionally, we excluded any genes flagged by HybPiper for paralogy
warnings (Bryc et al. , 2013).
To obtain single-nucleotide polymorphisms (SNPs) data, we used the
framework developed by DePristo et al. (2011) in GATK (McKennaet al. , 2010). We combined aligned and unaligned reads to the
reference, removed duplicate sequences, and performed genotype calling
collectively for all samples after generating preliminary variants
individually for each sample (Poplin et al. , 2018) in a Variant
Call Format (VCF) file. The filtering conditions we conducted on the
initial VCF file included using a ”hard filter” (QD < 5.0
|| FS > 60.0 || MQ
< 40.0 || MQRankSum < -12.5
|| ReadPosRankSum < -8.0), removing indels and
SNPs with missing data in GATK, and removing linked SNPs in PLINK (Changet al. , 2015). We conducted a Base Quality Score Recalibration in
GATK and repeated the variant calling step. To address the potential
effects of polyploidy, which can artificially increase heterozygosity
and allelic richness (Hokanson & Hancock, 1998), it is essential to
filter fixed heterozygotes in SNP datasets in polyploid species (e.g.,
Douglas et al., 2015; Cornille et al., 2016; Blischak et al., 2018;
Pavan et al., 2020). We removed loci with observed heterozygosity
(H O) > 0.5 from A. cantabrica (Appendix S1) data using the R package ”VCFR” (Knaus & Grünwald, 2017).
We established this filter by comparing heterozygosity and inbreeding
coefficient results for the diploid A. halleri to those obtained
for the tetraploid A. cantabrica (see Appendix S1 and Results).
The unfiltered data were retained for comparative studies.