DNA Sequencing and SNP Calling
Fresh
leaves were collected from 1 population of A. kamelinii , 6
populations of A. hebeica and 13 populations of A.
viridiflora to extract genomic DNA using a modified
cetyltrimethylammonium bromide (CTAB) method (Doyle & Doyle, 1987),
bringing the total number of sequence samples to 66. In addition, 12
individuals from other Aquilegia species overlapped the
distribution of the A. viridiflora complex and 1 individual fromParaquilegia microphylla was also used to extract genomic DNA.
For each individual,
the
Illumina Xten platform from Biomarker Technologies, Inc. (Beijing,
China) was used for genomic library generation and sequencing with 2 ×
150 bp paired reads. Furthermore, the raw sequence reads ofSemiaquilegia adoxoide s (SRR437677) were downloaded from the NCBI
SRA database (http://www.ncbi.nlm.nih.gov/sra) to be used as an
outgroup.
To
obtain high-quality genomes, all the reads were subjected to quality
control by FastQC (Andrew, 2010) and filtered as follows: reads with
adapters and reads with more than a 10% N content or more than 50%
low-quality bases (quality value of less than 10) were removed.
Low-quality reads were removed using NGStoolkit (Mulcare, 2004).
Clean sequence reads of 80 individuals were mapped to the reference
genome of A. coerulea from the previous study of Filiault et al.
(Filiault et al., 2018) using BWA v.0.7.12 with default parameters (H.
Li & Durbin, 2009). SAMtools v.0.1.18 was used to convert SAM files to
BAM files and sort reads (H. Li et al., 2009). The HaplotypeCaller,
GenotypeGVCFs and CombineGVCFs modules in
GATK
v.4.1.8.0 were used to produce accurate SNP calls (McKenna et al.,
2010). To improve the quality of SNPs, the VariantFiltration module in
GATK v4.1.8.0 was used for filtration with the following parameters:
“–filter-name FilterQual –filter-expression QUAL < 30.0
–filter-name FilterQD –filter-expression QD < 2.0
–filter-name FilterMQ –filter-expression MQ < 40.0
–filter-name FilterFS –filter-expression FS > 60.0
-window 5 -cluster 2”. Next, VCFtools v0.1.13 (Danecek et al., 2011)
was used to remove variants that 1) showed a minor allele frequency
(MAF) of 0.02 or less, 2) were not balletic variants, 3) showed a
sequencing depth of less than 5, and 4) showed a missing rate exceeding
0.5.