Essential Site Maintenance: Authorea-powered sites will be updated circa 15:00-17:00 Eastern on Tuesday 5 November.
There should be no interruption to normal services, but please contact us at [email protected] in case you face any issues.

loading page

Population assignment from genotype likelihoods for low-coverage whole-genome sequencing data
  • +1
  • Matthew DeSaix,
  • Marina Rodriguez,
  • Kristen Ruegg,
  • Eric Anderson
Matthew DeSaix
Colorado State University

Corresponding Author:[email protected]

Author Profile
Marina Rodriguez
Colorado State University
Author Profile
Kristen Ruegg
Colorado State University
Author Profile
Eric Anderson
Southwest Fisheries Science Center
Author Profile

Abstract

Low-coverage whole genome sequencing (WGS) is increasingly used for the study of evolution and ecology in both model and non-model organisms; however, effective application of low-coverage WGS data requires the implementation of probabilistic frameworks to account for the uncertainties in genotype likelihood data. Here, we present a probabilistic framework for using genotype likelihood data for standard population assignment applications. Additionally, we derive the Fisher information for allele frequency from genotype likelihood data and use that to describe a novel metric, the effective sample size, which figures heavily in assignment accuracy. We make these developments available for application through WGSassign, an open-source software package that is computationally efficient for working with whole genome data. Using simulated and empirical data sets, we demonstrate the behavior of our assignment method across a range of population structures, sample sizes, and read depths. Through these results, we show that WGSassign can provide highly accurate assignment, even for samples with low average read depths (< 0.01X) and among weakly differentiated populations. Our simulation results highlight the importance of equalizing the effective sample sizes among source populations in order to achieve accurate population assignment with low-coverage WGS data. We further provide study design recommendations for population-assignment studies and discuss the broad utility of effective sample size for studies using low-coverage WGS data.