Iliana Baums - 21DOCS Test Area

The software program STRUCTURE relies on a Bayesian iterative clustering algorithm to group samples using multi-locus genotype data and is one of the most cited tools for determining population structure. To infer the optimal number of clusters from STRUCTURE output, the ΔK method is often applied. However, a recent study relying on modeled microsatellite data suggested that this method has a downward bias in its estimation of K and is sensitive to uneven sampling. If this finding holds for empirical microsatellite datasets, conclusions about the scale of gene flow may have to be revised for a large number of studies. Here, we apply recently described estimators of K to re-estimate gene flow in 41 empirical microsatellite datasets; 15 from a broad range of taxa and 26 focused on a complex study system, coral. These datasets included 35 species, spanning seven continents, from diverse biological systems across the Tree of Life. After comparison of alternative estimates of K (Puechmaille statistics) with traditional (ΔK and posterior probability) estimates, we conclude that ΔK alone is insufficient for determining the most optimal number of clusters and sampling evenness does not necessarily predict agreement with traditional estimators. To better infer population structure, we suggest a combination of visual inspection of STRUCTURE plots and calculation of the alternative estimators at various thresholds in addition to ΔK. Differences between estimators could reveal patterns with important biological implications, such as the potential for more population structure than previously estimated, as was the case for many studies reanalyzed here.