Then Do Better

View Original

Social science, Economics, replication study crisis

There’s a crisis in psychological and social sciences that has spread into medical and now economics. It’s notable to me that this crisis isn’t in physical domain sciences which are better measured and tend not to involve the human. Perhaps humans are too messy.


The crisis is a replication crisis. Many significant results in social sciences have failed to repeat in subsequent studies and analysis.  Some results that have been replicated have shown a much lower magnitude of result. (See links below)


Some of the replication research is not perfect plus the philosophy of statistics would suggest a portion of any results at the 5% of 1 in 20 level will simply be random.  (See links below)


Some significant social science results that haven’t replicated are:

-the “ego depletion” theory of willpower  (Willpower is like a muscle, the argument goes. When it's tired, we're less focused; we give in to temptation and make shoddy decisions that hurt us later.)

-the “marshmallow test”  (children who can wait for 2 marshmallows in the future vs 1 marshmallow now, do better on certain other metrics)

-The social priming theories

-Power poses


Interestingly, it also seems scientists are good at predicting which tests are most likely to NOT replicate. (https://www.nature.com/articles/s41562-018-0399-z)



A summary here:

“Psychology’s reliability crisis erupted in 2011 with a wave of successive shocks: the publication of a paper purporting to show pre-cognition; a fraud scandal; and a recognition of p-hacking, where researchers exercised too much liberty in how they chose to analyze data to make almost any result look real. Scientist began to wonder whether the publication record was bloated with unreliable findings.

The crisis is far from being limited to psychology; many of the problems plague fields from economics to biomedical research. But psychology has been a sustained and particularly loud voice in the conversation, with projects like the Center for Open Science aiming to understand the scope of the problem and trying to fix it.

In 2015, the Center published its first batch of results from a huge psychology replication project. Out of 100 attempted replications, only around a third were successful. At the time, the replicators were cautious about their conclusions, pointing out that a failed replication couldmean that the original result was an unreliable false positive—but it could also mean that there were unnoticed differences in the experiments or that the failed replication was a false negative.

In fact, the bias toward publishing positive results makes false negatives a significant risk in replications.”  From https://arstechnica.com/science/2018/08/why-do-only-two-thirds-of-famous-social-science-results-replicate-its-complicated/

 



Interestingly this has spread into economics and the anomaly literature.


...In 2016, government economists Andrew Chang and Phillip Li tried to reproduce the results of 65 econ papers published in good journals. They got the original data, and even contacted the authors for help in following their footsteps. Yet still they only managed to reproduce 49 percent of the published findings…. See  https://www.federalreserve.gov/econresdata/feds/2015/files/2015083pap.pdf


In the anomaly literature it is notable that “value” and “momentum” results mostly (though not all) have replicated but at lower magnitudes. This lower magnitude result seems to be fairly consistent as well.  The paper also make a good point that microcaps seem to skew the results of previous work.


https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3275496

Most anomalies fail to hold up to currently acceptable standards for empirical finance.



My view:  while there  are complex root causes behind this (eg yes, there is fraud but it doesn’t explain many of the results), I think the overall message is to take results with a certain amount of skepticism and if you think the results seemed far fetched (eg Marshmallow test, possibly) then wait to see a replication study before putting too much weight on it.


A paper suggest the base rate fallacy is part of the explanation:  https://academic.oup.com/bjps/advance-article-abstract/doi/10.1093/bjps/axy051/5071906