The ongoing P-value debate

There are ever more good articles on what p-values are, their use and abuse.. as well as alternatives.

Two I have come over today include on article outlining the issue from a journalistic view, showing arguments for and against (in VOX); the second a journal article in JIBS, with concrete best practice suggestions.

VOX article: :

What a nerdy debate about p-values shows about science – and how to fix it

There’s a huge debate going on in social science right now. The question is simple, and strikes near the heart of all research: What counts as solid evidence? The answer matters because many disciplines are currently in the midst of a “replication crisis” where even textbook studies aren’t holding up against rigorous retesting.

JIBS article:

What’s in a p? Reassessing best practices for conducting and reporting hypothesis-testing research

If one were to believe all results published in business journals, scholars would be able to predict the future (almost) perfectly. In the vast majority of the papers published, almost all theoretically derived hypotheses are empirically supported.

Ten reccomendations from the above article:

Guideline 1:

At a basic level, all regression analyses should include, for each coefficient, standard errors (as well as mention the confidence intervals for the variable of interest) and, for each regression model, the number of observations as well as the R² statistics or equivalent

Guideline 2:

Authors should refer to the actual p-value rather than the threshold p-value when assessing the evidence for and against their hypothesis

Guideline 3:

Authors should not report asterisks to signal p-value thresholds

Guideline 4:

Reflections on effect sizes are included, reporting and discussing whether the effects (the coefficients and, if appropriate, marginal effects) are substantive in terms of the research question at hand

Guideline 4a:

When discussing effect size, authors should take the confidence interval associated with the estimated coefficient into account as well as the minimum and maximum effect (not just one standard deviation above and below the mean), thus providing a range of the strength of a particular relationship. This may be done graphically for more complex models

Guideline 4b:

When discussing effect sizes, where possible and relevant, authors should compare the range of the effect size of the variable of interest with other variables included in the regression model

Guideline 5:

Outlier observations are discussed carefully, especially when they have been eliminated from the sample (e.g., through technical practices such as ‘winzorizing’)

Guideline 6:

Null and negative findings are equally interesting as are positives, and hence are honestly reported, including a discussion of what this implies for theory

Guideline 7:

In the absence of a clear strategy designed explicitly to identify causes and effects, authors should be careful in using terminology suggesting causal relationships between variables of interest, and accordingly adjust their language in the wording of the hypotheses and in the discussion of the empirical results

Guideline 8:

To the extent feasible, authors should address issues of causality and endogeneity, either by offering technical solutions or by adopting an appropriate research design

Guideline 9:

Authors are expected to conduct a variety of robustness tests to show that the significant finding is not due to an idiosyncrasy of the selected empirical measures, model specifications and/or estimation strategy

Guideline 10:

HARKing is a research malpractice. Theory developed by interpreting empirical phenomena or results should be reported as such (for example, in the discussion section)

An Even Better P-curve

It is my pleasure to post the first guest post on the R-Index blog. The blog post is written by my colleague and partner in “crime”-detection, Jerry Brunner. I hope we will see many more guest posts by Jerry in the future. GUEST POST: Jerry Brunner Department of Statistical Sciences University of Toronto First, my…

The frequency of significance..

The ongoing P-value debate

What a nerdy debate about p-values shows about science – and how to fix it

What’s in a p? Reassessing best practices for conducting and reporting hypothesis-testing research

Ten reccomendations from the above article:

Guideline 1:

Guideline 2:

Guideline 3:

Guideline 4:

Guideline 4a:

Guideline 4b:

Guideline 5:

Guideline 6:

Guideline 7:

Guideline 8:

Guideline 9:

Guideline 10:

An Even Better P-curve

Recent Posts

Meta