Pangenome construction from short-read sequences: benchmarking for
population and conservation genomics
Abstract
As a collection of all the genetic variants in the gene pool, the
pangenome is a concept that will become fundamental to conservation
genomic studies. Unfortunately, most pangenomic approaches developed for
humans and model organisms are financially impractical for conservation
genomic studies of threatened or endangered species due to the high
costs associated with deep sequencing multiple individuals using long
read platforms. Here, by integrating metagenomic and iterative
map-then-assemble approaches, we (1) propose novel workflows to
construct graph pangenomes from multiple low-coverage short-read
datasets; (2) benchmark these short-read pangenomes (both linear and
graph) against a previously published long-read graph pangenome of the
barn swallow; and (3) evaluate the utility of our workflows in
population and conservation genomics. Our results indicate that
economical short-read graph pangenomes can recover the vast majority of
the variants identified through expensive long-read graph approaches,
and that these variants accurately detect important biological signals
(e.g., spatial structure and independent taxonomic delineations). These
results mean that researchers can utilize their limited,
conservation-oriented funding to more fully characterize all the
variants in a particular gene pool for population-level analyses.