loading page

Pangenome construction from short-read sequences: benchmarking for population and conservation genomics
  • +1
  • Jong Yoon Jeon,
  • Natalie Allen,
  • Andrew Black,
  • Andrew DeWoody
Jong Yoon Jeon
Purdue University
Author Profile
Natalie Allen
Purdue University
Author Profile
Andrew Black
Purdue University
Author Profile
Andrew DeWoody
Purdue University

Corresponding Author:[email protected]

Author Profile

Abstract

As a collection of all the genetic variants in the gene pool, the pangenome is a concept that will become fundamental to conservation genomic studies. Unfortunately, most pangenomic approaches developed for humans and model organisms are financially impractical for conservation genomic studies of threatened or endangered species due to the high costs associated with deep sequencing multiple individuals using long read platforms. Here, by integrating metagenomic and iterative map-then-assemble approaches, we (1) propose novel workflows to construct graph pangenomes from multiple low-coverage short-read datasets; (2) benchmark these short-read pangenomes (both linear and graph) against a previously published long-read graph pangenome of the barn swallow; and (3) evaluate the utility of our workflows in population and conservation genomics. Our results indicate that economical short-read graph pangenomes can recover the vast majority of the variants identified through expensive long-read graph approaches, and that these variants accurately detect important biological signals (e.g., spatial structure and independent taxonomic delineations). These results mean that researchers can utilize their limited, conservation-oriented funding to more fully characterize all the variants in a particular gene pool for population-level analyses.