The frequent insignificance of a “significant” P-value

David McGiffin; Geoff Cumming; Paul Myles

doi:10.22541/au.163250082.20225291/v1

loading page

The frequent insignificance of a “significant” P-value

David McGiffin,
Geoff Cumming,
Paul Myles

Abstract

Null hypothesis significance testing (NHST) and p-values are widespread in the cardiac surgical literature but are frequently misunderstood and misused. The purpose of the review is to discuss major disadvantages of p-values and suggest alternatives. We describe diagnostic tests, the prosecutor’s fallacy in the courtroom, and NHST, which involve inter-related conditional probabilities, to help clarify the meaning of p-values, and discuss the enormous sampling variability, or unreliability, of p-values. Finally, we use a cardiac surgical database and simulations to explore further issues involving p-values. In clinical studies, p-values provide a poor summary of the observed treatment effect, whereas the three- number summary provided by effect estimates and confidence intervals is more informative and minimises over-interpretation of a “significant” result. P-values are an unreliable measure of strength of evidence; if used at all they give only, at best, a very rough guide to decision making. Researchers should adopt Open Science practices to improve the trustworthiness of research and, where possible, use estimation (three-number summaries) or other better techniques.

04 Aug 2021Submitted to Journal of Cardiac Surgery

Show details

Hide details

04 Aug 2021Submission Checks Completed

04 Aug 2021Assigned to Editor

04 Aug 2021Editorial Decision: Accept

Nov 2021Published in Journal of Cardiac Surgery volume 36 issue 11 on pages 4322-4331. 10.1111/jocs.15960

Abstract

Peer review status:Published