Population assignment from genotype likelihoods for low-coverage
whole-genome sequencing data
Abstract
Low-coverage whole genome sequencing (WGS) is increasingly used for the
study of evolution and ecology in both model and non-model organisms;
however, effective application of low-coverage WGS data requires the
implementation of probabilistic frameworks to account for the
uncertainties in genotype likelihood data. Here, we present a
probabilistic framework for using genotype likelihood data for standard
population assignment applications. Additionally, we derive the Fisher
information for allele frequency from genotype likelihood data and use
that to describe a novel metric, the effective sample size, which
figures heavily in assignment accuracy. We make these developments
available for application through WGSassign, an open-source software
package that is computationally efficient for working with whole genome
data. Using simulated and empirical data sets, we demonstrate the
behavior of our assignment method across a range of population
structures, sample sizes, and read depths. Through these results, we
show that WGSassign can provide highly accurate assignment, even for
samples with low average read depths (< 0.01X) and among
weakly differentiated populations. Our simulation results highlight the
importance of equalizing the effective sample sizes among source
populations in order to achieve accurate population assignment with
low-coverage WGS data. We further provide study design recommendations
for population-assignment studies and discuss the broad utility of
effective sample size for studies using low-coverage WGS data.