Peer review is everybody’s favorite punching bag in science these days, and for good reason: As we and others have written, it’s secretive, susceptible to bias, and often appears to fail at keeping scientific publishing rigorous and honest.

But peer review is essential for the smooth operation of the scientific publishing apparatus. Without the imprimatur, however imperfect, of independent scholars, research papers would all in effect be titled “Trust us …”

The problem is, we have scant research into how peer review functions at its job of keeping out bad science. Journals don’t devote sufficient attention to studying the quality of their peer review systems, nor do they make those data available to outside scholars.


That could be changing.

A pair of scholars is calling for a modest moonshot to improve the system, which they call (rightly, we think) a “black box.” Writing in this week’s issue of Science, Carole Lee, a philosopher at the University of Washington, and David Moher, of the Ottawa Hospital in Ontario, Canada, argue that publishers should become much more transparent about their peer review practices.

“Though the vast majority of journals endorse peer review as an approach to ensure trust in the literature, few make their peer review data available to evaluate effectiveness toward achieving concrete measures of quality,” they write. Measures such as consistency in reviews, for example, would be helpful — did most papers get approved with mixed reviews or with flying colors?

“There is too little sound research on journal peer review; this creates a paradox whereby science journals do not apply the rigorous standards they employ in the evaluation of manuscripts to their own peer review practices.”

Lee and Moher propose that publishers spend 1 percent of their budgets on research into the effectiveness of their peer review systems — a number based on what the Human Genome Project spent to investigate the ethical, legal, and social implications of its efforts.

The obvious question here, of course, is: Why would publishers spend money that, at least so far, they haven’t felt the need to shell out? As the saying goes, why buy the cow when the milk’s free?

But Lee and Moher offer a few reasons that publishers ought to find compelling. The first involves fighting off incursions from predatory outfits that promise quality peer review on par with legitimate journals but rarely deliver. Being able to point to data showing superior reviewing would be a boon for non-predatory outfits. One attempt, PRE — or Peer Review Evaluation — has been around for a few years now. (Disclosure: One of us, I.O., was an advisor to PRE before it was acquired by the American Association for the Advancement of Science.)

Similarly, journals are gradually starting to look beyond impact factor as the most important signal of quality. Strong peer review could join emerging metrics like reproducibility and the willingness to share data as indicators that one journal is more reliable than another. We’ve even suggested a Transparency Index.

Until journals and publishers start taking a closer look at their own peer review processes, Lee and Moher write, “inadequately reported research will continue to waste time and resources invested by authors, reviewers, journals, academic institutions, funders, study participants, and readers — and limit the credibility and integrity of science.”

Fortunately, there have been some attempts to pry open the black box of peer review. In a baby step, a group of editors at the British Journal of Surgery created an online forum that allowed manuscripts to be peer-reviewed in the open. In a paper last month describing the experiment, published in PLOS ONE, the editors say the results were mixed. “Open online peer review is feasible in this setting,” they concluded, “but it attracts few reviews, of lower quality than conventional peer reviews.” (However the comparison may have not been quite fair, as Richard Smith, former editor of the British Medical Journal, wrote in response.)

Still, it’s perfect timing to keep this discussion alive and well: Later this summer the world’s small band of scholars who study peer review will gather in Chicago for the Peer Review Congress, which is held every four years.

Scientists will present their studies on what is and isn’t working about peer review, and novel ways to fix it. But just think how much more they’d have to go off of if journals pried open their black boxes and let some data out.

Leave a Comment

Please enter your name.
Please enter a comment.

  • The following comments about deficiencies of peer review may be more germane to prior posts of the authors regarding the subject. But I post them on the authors’ most recent post on the subject.

    1. Essentially all peer reviewed literature regarding differences in outcome rates suffers from a failure to recognize patterns by which measures of such differences tend to be affected by the prevalence of an outcome, including, but not limited to, the pattern whereby the rarer an outcome the greater tends to be the relative difference in experiencing it and the smaller tends to be the relative difference in avoiding it. See references at bottom.

    2. Commonly peer-reviewed literature will reflect the view that reducing some adverse health outcome should reduce relative demographic differences in rates of experiencing it, as reflected in the many statements over several decades along the lines of “despite declining mortality relative differences in mortality increased [or persisted].” Exactly the opposite is the case. Reducing the prevalence an outcome, which generally involves restricting it to those most susceptible to it, tends to increase relative differences rates of experiencing the outcome, while reducing relative differences in rates of avoiding the outcome.

    3. Virtually no peer-reviewed literature recognizes that is even possible for the relative difference in a favorable outcome and the relative difference in the corresponding adverse outcome to change in opposite directions as the prevalence of an outcome changes, even though the National Center for Health Statistics recognized more than a decade ago that this would tend to occur systematically. See references 1-3.

    4. Commonly peer-reviewed literature, especially that involving racial/ethnic and socioeconomic differences in cancer outcomes, will discuss relative differences in survival and relative differences in mortality interchangeably (often stating that the research is analyzing the former when in fact it analyzes the latter). Invariably, such analyses fail to recognize that the two relative differences tend to change in opposite directions over time, or, for example, that relative differences in mortality will almost always be greater among the young than the old, while relative differences in survival will almost always be greater among the old than the young. See especially reference 4 (Section A) and reference 5.

    5. Commonly peer-reviewed literature on subgroup effects/interaction/reporting heterogeneity will be premised on the expectation that, absent a subgroup effect, a factor that affects an outcome rate will cause equal proportionate changes in different baseline rates for the outcome. Invariably, such literature fails to recognize (a) the reasons to expect that a factor that affects an outcome rate will tend to cause a larger proportionate change in the outcome for the group with the lower baseline rate for the outcome, while causing a larger proportionate change in the opposite outcome rate for the other group; (b) that if a factor causes equal proportionate changes in different baseline rates for an outcome, it will necessarily cause different proportionate changes in the rates for the opposite outcome. See especially reference 2 at 41-43.

    6. To my knowledge, no peer-reviewed literature discussing explanations for reasons why a factor caused different proportionate changes in different baseline rates for an outcome has shown an awareness of the possibility that the factor would show an opposite pattern of the comparative size of effects on the opposite outcome, much less has discussed the fact that this tends usually to occur or in fact occurred in the particular situation examined. See especially reference 1 at 339-341.

    7. Frequently peer-reviewed literature will discuss changes in a particular measure of difference between outcome rates, without any mention that a different measure will yield an opposite conclusion. That occurs even when the measure that would yield an opposite conclusion is one more commonly used in the circumstances.

    8. Some peer-reviewed literature has discussed that a relative difference and the absolute difference between the rates at which two groups experience an outcome can or did change in opposite directions over time. But no such literature has ever recognized that anytime that happens, the relative difference in the opposite outcome will necessarily have changed in the opposite direction of the first relative difference and the same direction as the absolute difference. See reference 1 at 335-336 and 2 at 14 note 26.

    9. Commonly peer-reviewed literature will use “%” or “percent” when it means percentage points. Failure to distinguish between percent changes and percentage point changes has even led to the situation where observers read studies with essentially the same findings as reaching opposite conclusions. See references 6 and 7.

    10. Substantially more than half the time, when peer-reviewed literature attempts to explain that one rate is X times as high as another rate, it will state that the first rate is X times “higher” than the other. The New England Journal of Medicine is a notable exception. One hopes there are others See reference 8.

    11. The above points apply only to literature that survived peer review. But if an understanding of the issues addressed above were commonly recognized among peer reviewers, one would expect the issues eventually to be reflected in peer-reviewed literature in a way that so far is not in evidence.

    12. Peer reviewers commonly deal with issues much more complex than those discussed above. But peer reviewers’ handling of these relatively simple issues provides little basis for belief in the soundness of peer review regarding more complex issues.

    13. All statements about peer reviewers apply to journal statistical editors/consultants.

    1 “Race and Mortality Revisited,” Society (July/Aug. 2014)

    2. Comments of J. Scanlan for Commission on Evidence-Based Policymaking (Nov. 14, 2016),_2016_.pdf

    3. “The Mismeasure of Health Disparities,” Journal of Public Health Management and Practice (July/Aug. 2016)

    4. Comments of J. Scanlan for the Commission on Evidence-Based Policymaking (Nov. 28, 2016),_2016_.pdf

    5. Mortality and Survival Page of

    6. Percentage Points subpage of the Vignettes page of

    7. Spurious Contradictions subpage of the Measuring Health Disparities page of

    8. Times Higher subpage of the Vignettes page of

Sign up for our Morning Rounds newsletter

Your daily dose of what’s new in health and medicine.