It’s well understood that statistically significant results in scientific studies are not what they appear to be. Statisticians commonly set the bar for statistical significance at 95%—i.e., the probability that results are not random. But if researchers run 100 different experiments, 5 of them will appear to produce statistically significant results for no other reason than chance. When studies that do not achieve significance are pruned away—so called “survival bias,” which occurs in academic journals—it leaves a pool of apparently statistically significant studies where an even larger share of studies is significant for no other reason than chance. Because of this and other biases, reviews of academic and scientific studies (here, here, here and here for example) find upwards of 50% of the results cannot be repeated.
A new meta-study (and also here) of economic studies goes beyond “survival bias” and estimates upwards of 20% of the results are fudged.
The selection of studies by academic journals should increase as the statistical significance of the studies’ results increase. This should lead to a logically shaped distribution of studies in journals as a function of their statistical significance.
But rather than finding such a logically shaped curve, the study finds, “the distribution of test statistics published in three of the most prestigious economic journals over the period 2005-2011 exhibits a sizable under-representation of marginally insignificant statistics relatively to significant statistics but also to (very) insignificant ones. In a nutshell, once tests are normalized…the distribution has a two-humped camel shape [that] … cannot be explained by [journal] selection alone. …10% to 20% percent of tests with [statistical significance] are misallocated: there are missing test statistics just [below] the [statistically significant] threshold that…can [be] retrieve[d] [above] the [threshold]”
The study notes, “The two-humped shape is an empirical regularity that can be observed consistently across journals, years and fields. … Similarly, the two-humped camel shape is less visible in articles with theoretical models, articles using data from randomized control trials or laboratory experiments and papers published by tenured and older researchers. More generally, we find a larger residual in cases in which we would expect higher incentives for researchers to respond to selection,” i.e., “…evidence that academic economists respond to publication incentives.”
The researchers conclude, “Our analysis suggests that the pattern of this misallocation is consistent with what we dubbed an inflation bias: researchers might be tempted to inflate the value of those almost-rejected tests by choosing a slightly more ‘significant’ specification. … Among the tests that are marginally significant, 10% to 20% are misreported. These figures are likely to be lower bounds of the true misallocation as we use conservative collecting and estimating processes.”