2.4.1 Library preparation and data processing
For the discovery of single nucleotide polymorphisms (SNPs), multiplexed
ISSR genotyping by sequencing (MIG-seq) was conducted following the
procedure by Suyama & Matsuki (2015) with a slight modification:
annealing temperature of the first PCR was changed from 48°C to 38°C.
Both ends of fragments were obtained by paired-end sequencing (read 1
and 2), but only read 1 was used for the following analyses. Low-quality
reads were removed by FASTX-Toolkit
(http://hannonlab.cshl.edu/fastx_toolkit/) using quality_filteroption under the setting of q = 30 and p = 40. To remove the reads
derived from extremely short library entries, the sequence primer
regions in the sequences were searched and the reads which had the
primer sequence were removed by FASTX-Toolkit usingfastx_clipper option.
De novo assembly was performed using Stacks v. 2.53 (Catchen et al.,
2013). Since our samples were gametophytes, they were expected to be
haploid. However, some samples had both female and male markers or two
haplotypes of nuclear marker cetn -int2 suggesting diploidy.
Therefore, at first, we performed assembly with the following parameters
assuming that all samples are diploid: minimum number of identical reads
required to create a stack (m = 3), the nucleotide mismatches between
loci within a single individual (M = 2), the mismatches between loci
when building the catalogue (n = 1), and other parameters were set
default. The SNP genotype for each individual was exported using the
‘populations ’ command; only the first SNP was extracted from each
putative locus using the flags –write_single_snp . As we
expected, samples having some heterozygous loci were found (all samples
from p24, p27, and one samples from Ona). Then, excluding these samples,
the second assembly was performed with the following parameters assuming
that all samples were haploid: m = 3, M = 0, and n = 1. Furthermore,
since calling stacks from the secondary reads (reads that are not
distinguishable from sequencing error) produced heterozygosity within
individuals, it was disabled using the flags -N (set to zero) and -H.
Then, the SNP genotype for each individual was exported as in the
diploid dataset.
Both diploid and haploid datasets were processed using PLINK v. 2.00
(Chang et al., 2015;www.cog-genomics.org/plink/2.0/).
SNPs with a minor allele frequency < 0.03, loci with a
missing individual rate > 0.7, and individuals with a
missing locus rate > 0.7 were filtered out. The diploid
dataset included 237 samples from 36 populations, 818 SNPs (loci), and
the mean genotyping rate was 50.2%. The haploid dataset included 212
samples from 34 populations, 865 SNPs (loci), and the mean genotyping
rate was 48.8%. Format of the output files of PLINK was converted using
PGDSpider2 (Lischer & Excoffier, 2012) for subsequent analyses. All
samples from population p18 were removed due to their low genotyping
rate.