2.3 Quality filtering of FASTQ raw data
The raw sequencing files were checked for quality using FastQC (Andrews, 2010) and MultiQC (Ewels et al. , 2016), then trimmed using Trimmomatic (Bolger et al. , 2014) to remove adapters and reads with low quality (LEADING:30 TRAILING:30). Paired reads were used as input in HybPiper (Johnson et al. , 2016) and the ”mega353” target file (McLay et al. , 2021) was used to recover Angiosperms353 loci sequences. Reads were mapped to the mega353 reference using BWA (Li & Durbin, 2009) and were then assembled de novo using SPAdes (Bankevich et al. , 2012). Exon, intron and supercontig sequences were recovered using Exonerate (Slater & Birney, 2005). We excluded genes flagged with paralog warnings by Hybpiper and genes that were not recovered in at least 75% of samples.
We extracted protein-coding and intergenic sequences from the complete plastid genome of Androsace mariae Kanitz (GenBank: MT732944) and removed duplicates and sequences shorter than 200bp, resulting in a plastome reference of 125 plastid fragments. This reference was then used to recover plastid sequences with HybPiper, as described above.