Genomic analysis
Leaf material from 2 A1(c+) individuals (F0-tolerant
pool), 2 T6(c-) individuals (F0-sensitive pool) and 19
F3 individuals from 7 tolerant families (2 T6xA1 and 5 A1xT6)
(F3-tolerant pool) were used for DNA extraction (Dataset SD5). DNA was
isolated with a Qiagen DNeasy kit and libraries were prepared using a
TruSeq DNA PCR-Free Sample Preparation Kit from Illumina. Whole-genome
sequencing was performed on Illumina HiSeq 2000 by Novogene (UK). Read
depth was measured after processing and alignment to the TAIR10
reference (below) at 15x coverage for each individual sample and 180x
for the pooled sample (19 F3 individuals).
Raw sequence data was processed as follows: (1) read trimming by default
parameters with Sickle [23]; (2) alignment to the TAIR10 reference
with bwa mem 0.7.12 (processing with samtools 1.3); (3) removal of
duplicate reads using Picard (MarkDuplicates); (4) application to read
group name correction to the bam files using Picard
(AddOrReplaceReadGroups); (5) realigning Indels using the GATK
‘GenomeAnalysis’ Toolkit [24].
Bi-allelic SNPs were identified using ‘HaplotypeCaller’ and genotyped
using ‘GenotypeGVCF’ (both tools in GATK). Data were filtered using GATK
SelectVariants using these ‘GATK best practices’ guidelines: QD
< 2.0 || MQ < 40.00
|| FS > 60.0 || SOR
> 4.0 || MQRankSum < -8.0
|| ReadPosRankSum < -8.0 and a minimum
coverage of 10 per sample. Genotyping of the pooled sample (19A1xT6 individuals) was performed with LoFreq [25], as GATK
failed with ploidy levels required (ploidy-38 for 19 diploid samples).
LoFreq was run with default settings. VCFs were then merged with
BCFtools [26]. The allele frequency of all variants in the
F3-tolerant pool versus the F0-sensitive pool was determined
using the Multipool program [27]. Several intervals with bi-polar
peaks were identified across the genome. Intervals where the allele
frequency between the two pools differed considerably and had a positive
LOD score were further explored for candidate genes.
The whole genome of 4 A1(c+) and 4
T6(c-) individuals was sequenced in a previous study
[28] and were merged with the 2 parental samples sequenced for the
BSA-seq experiment. To obtain a consensus sequence for each deme,
private SNPs of T6 and A1 (shared for the 6 samples) were selected using
GATK and Col-0 TAIR10 sequence as a reference. Once we obtained the
consensus sequence, we selected the sites that differ between demes
using GATK ‘concordance’ command obtaining one vcf file with T16
AF<0.1 and A1 AF > 0.9 variants and a second vcf
file with T16 AF>0.9 and A1 AF< 0.1 variants. We
merged the files using VCFtools [29] and the amino acid changes
between A1 and T6 were obtained and quantified using SNPeff [30].
Genes with 3 to 10 genetic modifiers were selected as candidates
(Dataset SD3).