2.3 Quality filtering of FASTQ raw data
The raw sequencing files were checked for quality using FastQC (Andrews,
2010) and MultiQC (Ewels et al. , 2016), then trimmed using
Trimmomatic (Bolger et al. , 2014) to remove adapters and reads
with low quality (LEADING:30 TRAILING:30). Paired reads were used as
input in HybPiper (Johnson et al. , 2016) and the ”mega353” target
file (McLay et al. , 2021) was used to recover Angiosperms353 loci
sequences. Reads were mapped to the mega353 reference using BWA (Li &
Durbin, 2009) and were then assembled de novo using SPAdes
(Bankevich et al. , 2012). Exon, intron and supercontig sequences
were recovered using Exonerate (Slater & Birney, 2005). We excluded
genes flagged with paralog warnings by Hybpiper and genes that were not
recovered in at least 75% of samples.
We extracted protein-coding and intergenic sequences from the complete
plastid genome of Androsace mariae Kanitz (GenBank: MT732944) and
removed duplicates and sequences shorter than 200bp, resulting in a
plastome reference of 125 plastid fragments. This reference was then
used to recover plastid sequences with HybPiper, as described above.