Redemption of Statistical Significance
Conference
Regional Statistics Conference 2026
Format: CPS Abstract - Malta 2026
Keywords: significance
Session: CPS 29 Philosophy
Wednesday 3 June 4:30 p.m. - 5:30 p.m. (Europe/Malta)
Abstract
In the 20th century, leaders of three classical statistical theories have claimed superiority for their methodologies. Bayesian theorists combined a subjective prior probability with the objective posterior likelihood derived from the observed data. Fisher dismissed subjective probability and championed the null hypothesis as a probability data-generating model. Subsequently, Neyman and Pearson elaborated on Fisher’s model by incorporating another probability data-generating model, the alternative hypothesis. In the 21st century, classical statistical theories developed for small datasets have been superseded by computationally and mathematically intensive algorithms that analyze massive datasets with thousands of variables and millions of observations.
This paper does not speculate on the connections between classical statistical methods and modern data science. The purpose is to demonstrate that Fisher’s statistical significance is still a viable tool when working with small sample sizes for filtering out false effect sizes (effect size errors) that otherwise would be misinterpreted as substantively significant. Authors have called for a ban on statistical significance due to widespread misunderstanding and abuse. This paper does not address abuse such as p-hacking and harking, which are scientific misconduct. This paper proposes that misunderstandings, such as the notion that the p-value is the probability that a null hypothesis is true, can be corrected through an explicit, straightforward demonstration with empirical sampling distributions of mean differences and p-values. Towards that end, random variables were simulated under a true null hypothesis and analyzed with independent-samples t-tests.
Statistical significance was identified by p-values < 0.05. Substantive significance was evaluated with Cohen’s “index d” of standardized mean differences, where |d| < 0.20 is trivial, |d| ≥ 0.20 to 0.49 is small, |d| ≥ 0.50 to .0.79 is medium, and |d| ≥ 0.80 is a large effect size. It is shown that ignoring statistical significance yields numerous false effect sizes (effect-size errors) in small samples, which can be misrepresented as substantively significant under a true null hypothesis. With large sample sizes (e.g., n = 2,000), statistical significance detects only trivial effect sizes under a true null hypothesis, thereby ceasing to be a helpful screening tool. In addition, the paper shows that the power to reject a null hypothesis does not increase with sample size when the null hypothesis is true. That occurs only if the null hypothesis is false.
In summary, a statistically significant p-value should not be taken as definitive evidence of a substantive effect; it is merely a provisional conclusion that should be confirmed by replication or revised. Reproducing the figures and tables in this manuscript is easy by downloading the SAS data sets and analysis files from the public repository www.figshare.com. Replication with fresh data is straightforward because the computer's internal clock time initiates data streams of independent and identically distributed random variables that are sampled with replacement from the standard normal distribution [N(0,1)].