GenoPop-Impute: Efficient and accurate whole-genome genotype imputation
in non-model species for evolutionary genomic research
- Marie Gurke,
- Frieder Mayer
Frieder Mayer
Museum für Naturkunde - Leibniz-Institut für Evolutions- und Biodiversitätsforschung
Author ProfileAbstract
Missing genotypes in DNA sequence data are an issue in many evolutionary
genomic studies, especially of non-model organisms. It can be addressed
using genotype imputation. However, algorithms that do not require
additional genotype data as reference for imputation, which is often not
available for non-model taxa, and are able to work with large
whole-genome data sets are scarce. Therefore, we developed a new
algorithm called GenoPop-Impute, which imputes the whole genome in
separate batches and employs a random forest algorithm for imputation of
correlated data sets. The batch-wise approach utilizes linkage
disequilibrium to increase imputation accuracy and allows computational
parallelization and thus efficiency. Tests on simulated data demonstrate
that linkage disequilibrium between SNPs has a positive effect on
imputation accuracy, due to correlation that originated in a shared
evolutionary history. In comparison to two alternative algorithms,
GenoPop-Impute is more accurate and is the only one computationally
applicable to data sets of whole genomes. In addition, we found that
GenoPop-Impute also increases the accuracy of commonly estimated
population genomic metrics and mitigates biases due to missing data in
demographic modeling experiments. We conclude that genotype imputation
can be a valuable tool for evolutionary genomic studies of non-model
taxa and that GenoPop-Impute is a highly suitable algorithm for this.15 Aug 2024Submitted to Molecular Ecology Resources 22 Aug 2024Submission Checks Completed
22 Aug 2024Assigned to Editor
22 Aug 2024Review(s) Completed, Editorial Evaluation Pending
02 Sep 2024Reviewer(s) Assigned