A targeted capture approach to generating reference sequence databases
for chloroplast gene regions
Abstract
Metabarcoding has improved the way we understand plants within our
environment, from their ecology and conservation to invasive species
management. The notion of identifying plant taxa within environmental
samples relies on the ability to match unknown sequences to known
reference libraries. Without comprehensive reference databases, species
can go undetected or be incorrectly assigned, leading to false positive
and negative detections. To improve our ability to generate reference
sequence databases we developed a targeted capture approach using the
OZBaits_CP V1.0 set, designed to capture chloroplast gene regions
across the entirety of flowering plant diversity. We focused on
generating a reference database for coastal temperate plant species
given the lack of reference sequences for these taxa. Our approach was
successful across all specimens with a target gene recovery rate of 92%
which was achieved in a single assay (i.e., samples were pooled), thus
making this approach much faster and more efficient than standard
barcoding. Further testing of this database highlighted 80% of all
samples could be discriminated to family level across all gene regions
with some genes achieving greater resolution than others – which was
also dependant on the taxon of interest. Thus, we demonstrate the
importance of generating reference sequences across multiple chloroplast
gene regions as no single loci is sufficient to discriminate across all
plant groups. The targeted capture approach outlined in this study
provides a way forward to achieve this.