2.2 | Data for the target species and references
Raw data (fastq files) of the target species, C. batrachus ,T. bimaculatus, T. flavidus , and T. buxtoni were downloaded from the ENA database website (https://www.ebi.ac.uk/ena/browser/home, SRR7440020, SRR8285222, SRR7881551, SRR6913452, SRR6913453, SRR6913455). PCR duplicates were deleted using Prinseq (Schmieder & Edwards, 2011). Adapters and low-quality bases were removed using Trim Galore (https://github.com/FelixKrueger/TrimGalore). Next, the reads were corrected using k-mers with BFC (Li, 2015). Multiplicity distribution of the 23-mers was counted using Jellyfish2 (Marçais & Kingsford, 2011) and genome coverage was estimated using KrATER (https://github.com/mahajrod/KrATER). After processing, the final genome coverage of C. batrachus , T. bimaculatus , T. buxtoni , and simulated ancient DNA clean reads were all more than 30 x (Table S2). The insert sizes of paired-end reads were 180 bp, 300 bp, 250 bp, 350 bp, for C. batrachus , T. bimaculatus , T. flavidus , and T. buxtoni , respectively.
Reference genome assemblies of C. macrocephalus ,A. melas , T. rubripes , T. flavidus , T. nigroviridis , T. bimaculatus , M. mola , T. scriptus , T. strepsiceros , B. grunniens , and M. moschiferus were downloaded from the National Center for Biotechnology Information (NCBI); (Table S3-S5). The repeat contents of these genomes were masked using RepeatMasker (http://repeatmasker.org/).