Skip to Main Content

Sharing is one of the first interpersonal skills we develop as kids. The sandbox toys are for everyone. Bobby gets a turn on the swings, then Belinda does. You don’t offer treats to classmates unless you have enough for everyone.

But by that measure, the Journal of Experimental Psychology: Learning, Memory, and Cognition may need to go back to preschool. The journal recently found itself arguing the unarguable when it tried to expel a member of its editorial board who refused to review a manuscript until the authors — gasp! — showed their data.

The journal, which flies under the flag of the American Psychological Association (APA), backed down after the reviewer, Gert Storms, of the University of Leuven in Belgium, refused to quit, as did two other reviewers for the publication. But the parent organization affirmed to Nature that its policy was to let authors decide what data, if any, they’d like to provide.


“While we support open sharing of data when it can be ethically shared, we leave the decision of whether to do so to the author,” APA publisher Rose Sokol-Chang told the magazine.

And there’s the rub: Authors often don’t want to share. Sometimes, they claim that the data are proprietary, which may well be the case. But there are solutions to that. Data can be shared with reviewers on a confidential basis, as Simine Vazire, a psychologist at the University of California, Davis, suggested to Nature.


Or perhaps the data involve identifying patient information. Fine. Strip away that information, and consider using a trusted middleman like YODA, the Yale University Open Data Access Project. Even pharmaceutical companies — who may have billions riding on proprietary data — are willing to share through YODA.

But sometimes, the reasons for not sharing reflect the hypercompetitive culture of science and its heavy emphasis on publishing in prestigious journals — outlets like the New England Journal of Medicine. NEJM has, to its credit, embarked on one campaign designed to increase data-sharing. In the SPRINT Data Analysis Challenge, the journal made sure that data from a recent trial were available to researchers, and encouraged them to come up with new questions to try to answer.

But not everyone is happy. Dr. Jackson Wright, a clinical trial researcher at Case Western Reserve University, told Nature that “the incentives to do these trials will be dramatically lessened if this is going to be the expectation going forward. It’s a huge time commitment.”

We’ve criticized NEJM before for not doing enough to encourage — some might say force — researchers to do the right thing and share data. They have a rather large club to wield, and they should do it more. But they deserve kudos for their most recent effort. (And perhaps we need better incentives for data-sharing, so that Wright and his colleagues get credit for doing what we and many others argue is the right thing.)

The APA, however, deserves quite the opposite. Their caveat emptor attitude smacks of the same kind of indifference to integrity that journals displayed for decades while allowing fraudsters and other errant authors to write their own retraction notices. Yes, that happened (still does, to a much lesser extent), and it’s one of the reasons that we didn’t know until only recently that most papers are retracted for misconduct rather than honest error: “Oh, Dr. Jones, you say you accidentally spliced images from four different experiments into that figure, but the results of your study haven’t changed? Sounds good to us!”

What’s more, the stance is foolishly shortsighted. Given what many say is a sorry state of affairs in psychology research, where published studies are more likely than not to be non-replicable, you’d think the APA would demand that authors share their data.

As Vazire told Nature, trying to evaluate a study without being able to see the underlying results “is like buying a used car without being able to look under the hood.”

Does the APA really want to be the publishing equivalent of a shady used car dealer? We hope not.

  • There’s a much simpler reason for this — groups that share data are at a disadvantage relative to those in the same field that don’t. The latter can use data from the former to increase statistical power and the likelihood of validating a result, but the former cannot reciprocate. As long as data sharing is not mandated, it will continue to be thus. This doesn’t even take into account other motives (such as corner-cutting or fraud). Assuming everyone is rigorous and honest, there’s no escaping the math — more samples and more validation sets will always be better than fewer of each.

    I say this as someone who has published on both sides. It feels icky to be in the latter group, but the incentives currently in place encourage it. Fix the incentives at the funding and reviewing level or expect this to forever remain the case.

  • This is all great in theory. But releasing sensitive data, especially data containing patient information, poses a huge liability risk. The authors of this article are rather dismissive of de-identification procedures (“fine”, “strip away that information”). However, note the experience of AOL releasing search information, and then note the followup article in the NY Times identifying supposedly anonymous subjects, and then note the liability nightmare such data release caused the company. (See

    Further, medical/health scientists are often explicitly forbidding from sharing data, even in supposedly anonymous form, by their IRB. It’s easy and fun to talk about data sharing, but far more difficult to implement in practice.

  • “As Vazire told Nature, trying to evaluate a study without being able to see the underlying results “is like buying a used car without being able to look under the hood.”

    Perhaps it would be more like buying a used car without access to Carfax or similiar services that provide details about history of a vehicle using VIN numbers?

  • This is a continuing problem. I am currently working on a secondary analysis of a dataset. The original paper was published 5 years ago (I won’t mention who did the paper). I am working with someone who has a signed data-sharing arrangement with the original author. The original author will not send us the actual data, but sends us only modified datasets. We have all the original forms with all the original variable names. The author insists that we have all the variables, but this is clearly not true. The author sent us a Stata script which compiles the analysis dataset from the original form-based datasets, but we are missing many variables as listed in the dataset. In addition, in some cases, I have tried to replicate tables from the original paper, and get different numbers. Either the PI is being misled by his statistician, or the statistician is incompetent, or both.

Comments are closed.