GeneMiner: a tool for extracting phylogenetic markers from
next-generation sequencing data
Abstract
The advancement of next-generation sequencing (NGS) technologies has
been revolutionary for the field of evolutionary biology. This
technology has led to an abundance of available genomes and
transcriptomes for researchers to mine. Specifically, researchers can
mine for various types of molecular markers that are vital for
phylogenetic, evolutionary, and ecological studies. Numerous tools have
been developed to extract these molecular markers from NGS data.
However, due to an insufficient number of well-annotated reference
genomes for non-model organisms, it remains challenging to obtain these
markers accurately and efficiently. Here, we present GeneMiner, an
improved and expanded version of our previous tool, Easy353. GeneMiner
combines the reference-guided de Bruijn graph assembly with seed
self-discovery and greedy extension. Additionally, it includes a
verification step using a parameter-bootstrap method to reduce the
pitfalls associated with using a relatively distant reference. Our
results using both experimental and simulation data showed GeneMiner can
accurately acquires phylogenetic molecular markers for plants using
transcriptomic, genomic, and other NGS data. GeneMiner is designed to be
user-friendly, fast, and memory efficient. Further, it is compatible
with Linux, Windows, and macOS. All source codes are publicly available
on GitHub for easy accessibility and transparency
(https://github.com/yyscu/GeneMiner).