DNA Barcoding and geographical scale effect: the problems of
undersampling genetic diversity hotspots
Abstract
DNA barcoding identification needs a good characterization of
intra-specific genetic divergence to establish the limits between
species. Yet, the number of barcodes per species is many times low and
geographically restricted. A poor coverage of the species distribution
range may hamper identification, especially when undersampled areas host
genetically distinct lineages. If so, the genetic distance between some
query sequences and reference barcodes may exceed the maximum
intra-specific threshold for unequivocal species assignation. Taking a
group of Quercus herbivores (moths) in Europe as model system, we found
that the number of DNA barcodes from southern Europe is proportionally
very low in the Barcoding of Life Data Systems (BOLD). This geographical
bias complicates the identification of southern query sequences, due to
their high intra-specific genetic distance with respect to barcodes from
higher latitudes. Pairwise intra-specific genetic divergence increased
along with spatial distance, but was higher when at least one of the
sampling sites was in southern Europe. Accordingly, GMYC (General Mixed
Yule Coalescent) single threshold model retrieved clusters constituted
exclusively by Iberian haplotypes, some of which could correspond to
cryptic species. The number of putative species retrieved was more
reliable than that of multiple threshold GMYC but very similar to
results from ABGD and jMOTU. Our results support GMYC as a key resource
for species delimitation within poorly inventoried biogeographic regions
in Europe, where historical factors (e.g. glaciations) have promoted
genetic diversity and singularity. Future European DNA barcoding
initiatives should be preferentially performed along latitudinal
gradients, with special focus on southern peninsulas.