High-throughput sequencing of 5S-IGS in oaks - exploring intragenomic
variation and algorithms to recognize target species in pure and mixed
samples.
Abstract
Measuring biological diversity is a crucial but difficult undertaking,
as exemplified in oaks where complex morphological, ecological,
biogeographic and genetic differentiation patterns collide with
traditional taxonomy that measures biodiversity in number of species (or
higher taxa). In this pilot study, we generated High-Throughput
Sequencing (HTS) amplicon data of the intergenic spacer of the 5S
nuclear ribosomal DNA cistron (5S-IGS) in oaks, using six mock samples
that differ in geographic origin, species composition, and pool
complexity. The potential of the marker for automated geno-taxonomy
applications was assessed using a reference dataset of 1770 5S-IGS
cloned sequences, covering the entire taxonomic breadth and distribution
range of western Eurasian Quercus, and applying similarity (BLAST) and
evolutionary approaches (ML trees and EPA). Both methods performed
equally well, with correct identification of species in sections Ilex
and Cerris in the pure and mixed samples and main genotypes shared by
species of sect. Quercus. Application of different cut-off thresholds
revealed that medium-high abundance sequences (>10 or 25)
suffice for a net species identification of samples containing one or
few individuals. Lower thresholds identify phylogenetic correspondence
with all target species in highly mixed samples (analogue to
environmental bulk samples) and include rare variants pointing towards
reticulation, incomplete lineage sorting, pseudogenic 5S units, and
in-situ (natural) contamination. Our pipeline is highly promising for
future assessments of intra-specific and inter-population diversity, and
of the genetic resources of natural ecosystems, which are fundamental to
empower fast and solid biodiversity conservation programs worldwide.