As the British politician Benjamin Disraeli famously observed, there are lies, damn lies, and statistics. But liars who use statistics may damn themselves to detection.
Using an emerging approach to data policing, a group of scientists in Scotland and New Zealand has found compelling evidence that Yoshihiro Sato, a Japanese bone specialist, fabricated data in many, if not all, of 33 randomized controlled trials they analyzed.
Sato’s work looked at the effects of various substances, from vitamin D to prescription drugs like alendronate, on the risk of hip fracture. His publishing record hasn’t been blemish-free — a dozen of Sato’s papers have been retracted so far.
In the analysis published this week in Neurology, Mark Bolland, of the University of Auckland, and his colleagues find that the misconduct is much more far-reaching: Sato — and quite possibly his colleagues — appears to have been fiddling with his data in a big way.
Bolland’s team looked at how much the results Sato reported deviated from what would normally be expected for the patients he was purportedly studying. In other words, when scientists make up data, they tend to do so in a way that’s too smooth and fails to reflect the natural world’s tendency to be messy. Although catching such homogeneity is tough with the naked eye, statistical analysis can find it pretty easily.
So how off were the results? It turns out that some of the data Sato reported had a 5.2 x 10–82 chance of being plausible. That’s basically zero.
Sato’s studies also had improbably positive results and much less mortality over time considering that the patients in his research tended to be older people with significant health problems. “Taken together with the implausible productivity of the group, internal inconsistencies for outcome data in their work, duplication of data, numerous misleading statements and errors, and concerns regarding ethical oversight,” the authors wrote, “our analysis suggests that the results of at least some of these trials are not reliable.”
If all of this sounds a shade familiar, it should. The approach mirrors that taken in the case of Yoshitaka Fujii, the record holder for retractions (at 183). Using the same statistical jujitsu as Bolland and friends, John Carlisle, of the United Kingdom, demonstrated in 2012 that Fujii’s published data were overwhelmingly unlikely to have resulted from actual experiments. And Uri Simonsohn has done the same for the work of several psychology researchers. Psychology has also found itself scrutinized, on a more granular level, and not for fraud, by an algorithm called statcheck.
The bad news, of course, is that analyses such as those Bolland, Carlisle, and Simonsohn have conducted come after fraudsters publish their bogus data. Ideally, peer reviewers and journal editors would find a way to catch misconduct before it reaches print. Many editors point out that they and their reviewers do in fact sometimes find evidence of fraud. But what tends to happen is that the paper is rejected, and ends up in a lower-tier journal, sometimes with the evidence scrubbed. Editors are reluctant — thanks to the fear of lawsuits, among other concerns— to share that information with other editors, so fraudsters often get away with it.
Other red flags, which Sato’s case illustrates, also emerge. For starters, Sato was a remarkably productive scientist, conducting at least 33 randomized controlled trials (considered the gold standard for clinical research) between 1997 and 2012. That in and of itself is enough to ring alarms, according to Bolland and colleagues.
What’s more, several of his papers misstated — and later corrected — the number of patients he and his colleagues reportedly included in trials, while others duplicated text and data — even though they were ostensibly looking at different things. The group also found evidence that the ethics oversight of Sato’s research may not always have been sufficient.
Some of those red flags are enough to sink a paper, but all of them should be enough to prompt a deeper investigation. And maybe it’s time for journals to start calling in their own “CSI: Data” teams to do some statistical vetting prior to publication. That will require some time and effort — and, we’ll admit, resources — but isn’t the integrity of science worth it?