2.3 | Mapping, variant calling and filtering
Genomic reads obtained from individuals and pools were mapped against the P. fijiensis reference genome (https://genome.jgi.doe.gov/Mycfi2/Mycfi2.home.html , Arango Isaza et al., 2016). Pool-sequencing data were treated using the same pipeline and filtering parameters as in Carlier et al. 2021b. Data available from the 2011 samples were rerun with the 2013 samples so the same versions of software were used for both. SNP calling was performed separately for the samples from the two years because some analyses were only possible using samples from 2011, for which some phenotypic data were available (see explanation below). After filtration (mapping quality > 30, minimum read count=3, minimum allelic frequency=0.03) , 981 001 and 1 792 219 biallelic SNPs were detected in the six and eight populations collected in 2011 and 2013, respectively. For the sequencing of individuals, the genomic reads of 63 isolates were mapped separately using bwa v0.7.15 software (Li & Durbin, 2010) with bwa_men commands and default parameters. Duplicates were tagged and eliminated using Picard Toolkit v 2.7.0 (Picard Toolkit, 2019, Broad Institute, GitHubRepository:http://broadinstitute.github.io/picard/) and mark_duplicates command. Genome Analysis Toolkit (GATK) v 4.1.4.0 (McKenna et al., 2010) was used for SNP calling with Haplotypecaller command and all individuals were merged in the same file in variant call format (VCF) with the GenotypeGVCFs_merge. The VCF file was filtered to keep only SNPs using GATK’s SelectVariants command and variants were filtered for quality with the VariantFiltration command with the same parameters as in Derbyshire et al. 2019. A second filter was then applied to each genotype from the VCF file using vcftools v.0.1.14 (Danecek et al., 2011) with the following parameters: maf 0.01, minDP 4 maxDP 100, minGQ 20, max-missing 0.7. After filtration, 758 407 SNPs were identified among the 63 isolates. The VCF file was converted using a custom script into FASTA files containing all individuals, the required format for some analyses below.