Quantifying the magnitude of phenotypic plasticity to compare among species, populations, cultivars, or genotypes is important for revealing the ecological and evolutionary significance of plastic responses to various abiotic and biotic factors. Commonly used plasticity estimators have occasionally been found to generate different species’ plasticity rankings. However, we do not know how frequent this incongruence is or the factors that influence the occurrence thereof; nor do we know which plasticity estimator is more reliable. We first addressed these problems using a theoretical framework, revealing inherent conflicts between the reaction norm slope and plasticity indices, and the conditions affecting these conflicts. We then empirically tested the effects of the estimators on interspecific plasticity differences by reanalyzing 1248 sets of relevant data, confirming the predictions derived from our theoretical framework. Finally, we show through theoretical analyses that the reaction norm slope is more reliable than plasticity indices for interspecific comparisons.