Sunday, October 26, 2014

Biased science

Most people think of bias in personal terms, but in science the most pernicious forms of bias are institutional, not personal. They are not, in other words, the result of rogue scientists fudging their findings to support their pet theories. Rather, they are the result of biased processes for publishing scientific findings that are, in fact, perfectly legitimate.

The key point to understand is this: For any scrupulously conducted scientific study or experiment, there is always some chance that its findings are wrong. Reporting bias and publication bias are effectively institutional preferences for selecting the results of just such studies and experiments for publication, while thousands of others that find no such results never see the light of day.  Both forms of bias are rampant in science, and their causes are many.

Social sciences suffer from major reporting bias, because most negative results are not reported. Franco, et al. (2014) conclude that out of 221 survey-based experiments funded by the National Science Foundation from 2002 to 2012, two-thirds of those with results that did not support a tested hypothesis were not even submitted for publication. Strong results were 60% more likely to be submitted and 40% more likely to be published than null results.  Only 20% of those with null results ever appeared in print. (See graphic here.)

It is not much better in clinical studies. Reporting bias leads to over-estimating or under-estimating the effect of a drug intervention, and reduces the confidence with which evaluators can accept a test result or make judgments about the significance of such results. For any medicines or medical devices regulated by the FDA, posting at least basic results to is mandatory within one year of study completion, but compliance is low. A 2013 estimate puts the failure to publish or post basic results at over 50%. No study results get reported at all in 78% of 171 unpublished but registered studies completed before 2009.

Reporting bias infects the evidence evaluation process for randomized controlled trials (RCTs), the basic experimental design for testing scientific hypotheses. That RCTs have limits is well-known. Each requires a large number of diverse participants to achieve statistical significance. Often the random assignment of participants or sufficient blinding of subjects and investigators is not feasible, and lots of hypotheses cannot be tested due to ethical concerns. For instance, sham or ineffective treatments given to seriously suffering patients harm those who might otherwise benefit. We also shouldn’t do an antisocial behavior RCT in a simulated prison environment and to get accurate data. When RCTs are ethical and well-designed, the critical opinions of experts is crucial, since a risk of bias is always present.

Peer-review assesses the value of RCTs, but the effectiveness of this process is compromised when relevant data are missing.Without effective peer-review we consumers of science and its applications have no coherent reason to believe what scientists tell us about the value of medical interventions or the danger of environmental hazards. 

Not sharing, publishing, or making accessible negative results has numerous bad consequences. Judgments based on incomplete and unreliable evidence harms us. We probably accept many inaccurate scientific conclusions. Ioannidis (2005), for example, contends that reporting bias results in most published research findings being false.

Reporting bias harms participants in studies who are exposed to unnecessary risks. Society fails to benefit from the inclusion of relevant RCTs with negative results in peer-review evaluations. Researchers waste time and money testing hypotheses that have already been shown to be false or dubious. Retesting drug treatments already observed to be ineffective, or no more effective than a placebo squanders resources. Our scientific knowledge base lacks defeaters that would otherwise undercut flawed evidence and false beliefs about the value of a drug. RCTs and the peer-review process are designed to detect these but fail due to selective reporting.

RCT designs are based on prior research findings. When publishers, corporate sponsors, and scientists are unaware of previous negative results and prefer positive to negative results, many hypotheses with questionable results worthy of further testing are overlooked. Since all trials do not have an equal chance of being reported, datasets skew positive (erroneously) and this affects which hypotheses scientists choose to examine, accept, or reject.

Mostly positive results in the public record make the effect of a drug with small or false positive effects appear stronger than it actually is, which in turn misleads stakeholders (patients, physicians, researchers, regulators, sponsors) who must make decisions about resources and treatments on the basis of evidence which is neither the best nor available. Studies of studies (meta-analyses) reveal this phenomenon with popular, widely prescribed antiviral and antidepressant medications. Ben Goldacre tells a disturbing story about the CDC, pharmaceutical companies and antivirals.

A meta-analysis uses statistical methods to summarize the results of multiple independent studies. RCTs must be statistically powerful enough to reject the null hypothesis, i.e., the one researchers try to disprove before accepting an alternative hypothesis. Combining RCTs into a meta-analysis increases the power of a statistical test and resolves controversies arising from conflicting claims about drug effects. In separate meta-analyses from 2008 of antidepressant medications (ADMs) Kirsch,et al., and Turner, et al., find only marginal benefits over placebo treatments. When unpublished trial data get added back to the dataset, the great benefit previously reported in the literature becomes clinically insignificant. This is disturbing news: For all but the most severely depressed patients ADMs don’t work, and they may appear to work in the severely depressed because the placebo stops working, which magnifies the apparent affect of the ADM compared to placebo-controls.

Even when individual scientists behave well, the scientific establishment is guilty of misconduct when it fails to make all findings public.  In order for science to be the self-correcting, truth-seeking process it clams to be, we need access to all the data.

Scott Merlino
Department of Philosophy
Sacramento State


  1. Scott, thanks for this interesting post. I have a couple of questions.

    (1) Although the actual suppression of negative findings seems corrupt to me, I wonder to what extent the institutional bias for publishing interesting findings may be seen as quite reasonable prior to the digital age when publication has gone from an inexpensive, labor intensive process to, roughly, the pushing of a button. I ask this partly because to me, it possibly illuminates the important point that what we find utterly reprehensible in one context is, at worst, a necessary evil in another.

    (2) As an extension of that point, I wonder if it is possible that, given the suddenness with which we have found it even thinkable to do much of what you say is necessary, that the reality is that we are not doing such a bad job of it. I don't feel I'm in a position to assess this sort of claim, but in the little reading I have done I'm struck that there is a serious movement in some quarters at least to bring this all about.

    1. Great post Scott! I can see how much of this is perpetuated by various levels of competition regarding publishing. For a science PhD student, disproving several interesting hypotheses and confirming a well-established one is usually very bad news. They want to refute a well-established theory or provide evidence that a new and interesting theory is true. This is the way it is because many PhD students want to be professors at research institutions. And to get those jobs, they usually need publications in the top journals. But existing professors also want to publish in the top journals because they are often read by more of their target audience than the lesser journals, which helps professors get promoted and get research grants to further their own research agendas. Journal editors also want their journals to be top journals, so they have to please their target audience. To please professors, journals have to publish articles that will interest them. Reports of failed experiments and unsupported hypotheses are much less likely to have broad appeal. Can you imagine a journalist wanting to run a piece on that research? Even for scientific American, or a science blog? It's just not very interesting.

      So, we see pressure to publish, get a job, and to be published all push in the same direction. Now add intense competition at all levels and we can see why many academics don't bother trying to publish their negative results... and that when they do, editors and reviewers reject their submissions.

      The solution would be open access journals that welcome negative results. That way researchers could really incorporate the negative findings in their meta-analyses etc. The current roadblock to this is probably mainly the attitude of scientists that such results are not very valuable. This attitude would make these negative-result-publishing journals quite unprestigious, making it harder to find editors and reviewers and not making it worth researchers time to submit.

      What's the solution to that problem of attitude? More great articles like Scott's, I'd say.

    2. Randy, If I follow your first point, that suppression is sometimes or has been reasonable, I can agree. Filter or don't publish negative findings but do post all study results to a registry or database anyway. I imagine negative results overwhelming publications and smothering interesting results. Better, as a necessary condition of future funding or FDA approval, one must share results publicly. About your second point: Yes, it is impressive and good that researchers are self-policing. It is not enough, yet, but I take this as evidence that science as a process is healthier say than politics, religion, or academe, where tradition, popular opinion and over-confidence mislead so many.

  2. Scott, I like this last point a lot. There is a kind of meta-realization occurring, that it's not enough to publish only findings that are achieved in a scientifically rigorous manner, but to report or publish these findings in the same way. Cherry-picking publications is no different than cherry-picking data. One way to do this would be to randomly select from the pool of properly run experiments and studies those that will be published. Which is a pretty funny thing to think about. Work your butt off conducting a 10 year longitudinal study and we will enter it into a lottery for a chance at publication! In reality, though, that is what was happening, but the lottery was not conducted by humans. In the imagined scenario, scientists who get negative findings would have a chance at publication that they didn't before.

    But the alternative now available would seem better, publish everything. This puts us on the Big Data model for scientific inquiry, which sees random sampling as not essential to science at all, but a primitive way of making large data sets useful which we no longer require.

    Cool stuff.

  3. Thanks Dan. You raise important points. Perhaps the publishing-as-competition model is not serving the public or PhD aspirants well. I agree, negative findings do not fuel many careers. The thing is, I don’t now how else to sort people seeking scarce research positions. There are plenty of people willing to speculate about data but few who know how to collect it in such a way as to test a precise hypothesis. We don’t need a lot of principal investigators: Algorithms can do retrospective studies and systematic reviews. We already have too many PhDs and those who are PhDs are just not happier, and we should actively discourage some PhD seekers, see this Nature article for eye-opening graphics:

    More open access journals seem to be the way to go, but we lose the quality-control aspect we have now, these sorts of publications do not undergo rigorous (imperfect) peer-review. Such publications place everything on a par, which is an illusion. Perusers of such sites and journals can’t really sort for quality. But closed access to results and data is the bigger problem. For instance, when non-scientist consumers of science want to know more about whether, say, drinking red wine is actually good for you heart, or whether a gluten-free diet alleviates irritable bowel syndrome, it is almost impossible to get good information from primary sources, because relevant published research resides behind paywalls. I’m inclined also towards positive results derived by rejecting a null hypothesis, but my skepticism is dialed higher now. One or two studies with small sample sizes (less than say 500) but positive results can’t be more than merely suggestive of further, larger studies. It’s too bad that prestige follows public displays of publication success, and that publication results are overwhelmingly and erroneously positive. The result is that we get lots of ensconced academics rewarded for career-building performance and not community-benefiting contributions to public knowledge.

  4. Scott, you’ve made an interesting and convincing case about the need to restructure the institution of scientific research. I have just one small point to offer. You say that because there is not enough publication of negative results, “Researchers waste time and money testing hypotheses that have already been shown to be false or dubious.” Yet perhaps it is less a waste than one might expect, if as you say so much scientific testing cannot be trusted. Perhaps, until the reforms you want do get made, we should not put so much trust in those who say, “This has already been shown to be false or dubious.”