
Every medical researcher dreams of doing studies or conducting clinical trials that generate results so compelling they change how diseases are treated or health policy is written. In reality, we are lucky if the results are even a little bit positive, and often end up with “null” results, meaning that the effect of a policy, drug, or clinical intervention that we tested is no different than that of some alternative.
“Null” comes from the null hypothesis, the bedrock of the scientific method. Say I want to test whether the switch to daylight saving time affects the outcomes of surgery because surgeons may be slightly more fatigued in the days following the transition due to lost sleep. I set up a null hypothesis — surgery-related deaths are no different in the days immediately before the switch to daylight saving time compared to the days immediately after it — and then try to nullify, or disprove, it to show that there was indeed a difference. (Read on to see the answer, though you can probably guess from the headline what it is.) Disproving the null hypothesis is standard operating procedure in science.
Null results are exceedingly common. Yet they aren’t nearly as likely to get published as “positive” results, even though they should be. In an analysis of nearly 30,000 presentations made at scientific conferences, fewer than half were ultimately published in peer-reviewed journals, and negative or null results were far less likely to be published than positive results. Clinical trials with positive findings are published more often and sooner than negative or null trials.
That’s a shame, because publishing null results is an important endeavor. Some null results represent potentially important discoveries, such as finding that paying hospitals for performance based on the quality of their outcomes has no effect on actually improving quality. The majority of research questions, though, don’t fall into this category. Leaving null results unpublished can also result in other researchers conducting the same study, wasting time and resources.
Some unpublished null findings are on important topics, like whether public reporting of physician’s outcomes leads physicians to “game the system” and alter the care that they provide patients. Others come from explorations of quirkier topics.
Here are a few of each from my own unpublished research.
Daughters and life expectancy. Daughters are more likely than sons to provide care to their ailing parents. Does that mean being blessed with a daughter translates into greater life expectancy? Using data from the U.S. Health and Retirement Study, I compared mortality rates among adults with one daughter versus those with one son. There was no difference. Ditto for families with two daughters versus two sons.
Daylight saving time and surgical mortality. The switch to daylight saving time in the spring has been linked to increased driving accidents immediately after the transition, attributed to fatigue from the hour of lost sleep. I investigated whether this time switch affects the care provided by surgeons by studying operative mortality in the days after the transition. U.S. health insurance claims data from 2002 to 2012 showed no increase in operation-related deaths in the days after the transition to daylight saving time compared to the days just before it.
Tubal ligations and son preference. A preference for sons has been documented in developing countries such as China and India as well as in the United States. When I was a medical student rotating in obstetrics, I heard a patient ask her obstetrician, “Please tie my tubes,” because she had finally had a son. Years later, I investigated whether that observation could be systematically true using health insurance claims data from the U.S. Among women who had recently given birth, there was no difference in later tubal ligation rates between those giving birth to sons versus daughters.
Gaming the reporting of heart surgery deaths. One strategy for improving the delivery of health care is public reporting of doctors’ outcomes. Some evidence suggests that doctors may game the system by choosing healthier patients who are less likely to experience poor outcomes. One important metric is 30-day mortality after coronary artery bypass graft surgery or placement of an artery-opening stent. I wanted to know if heart surgeons were trying to avoid bad scores on 30-day mortality by ordering intensive interventions to keep patients who had experienced one or more complications from the procedure alive beyond the 30-day mark to avoid being dinged in the publicly reported statistics. I hypothesized that in states with public reporting, such as New York, deaths would be higher on post-procedure days 31 to 35 than on days 25 to 29 if doctors chose to keep patients alive by extreme measures. The data didn’t back that up — there was no evidence that cardiac surgeons or cardiologists attempt to game public reporting in this way.
Halloween and hospitalization for high blood sugar. Children consume massive amounts of candy on and immediately after Halloween. Does this onslaught of candy consumption increase the number of episodes of seriously high blood sugar among children with type 1 or type 2 diabetes? I looked at emergency department use and hospitalization for hyperglycemia (high blood sugar) among children between the ages of 5 and 18 years in the eight weeks before Halloween versus the eight weeks after, using as a control group adults aged 35 and older to account for any seasonal trends in hospitalizations. There was no increase in emergency visits for hyperglycemia or hospitalizations for it among either adults or children in the days following Halloween.
The 2008 stock market crash and surgeons’ quality of care. During a three-week period in 2008, the Dow Jones Industrial Average fell 3,000 points, or nearly 25 percent of the Dow’s value. The sharp, massive decline in wealth for many Americans, particularly those with enough money to be heavily invested in stocks, had the potential to create immediate and significant stress. Was this acute, financial stress large enough to throw surgeons off their game? Using U.S. health insurance claims data for 2007 and 2008 that included patient deaths, I analyzed whether weekly 30-day postoperative mortality rates rose in the month following the crash, using 2007 as a control for seasonal trends. There were nearly identical 30-day mortality rates by week in both 2007 and 2008, suggesting that the stock market crash, while stressful, did not distract surgeons from their work.
The bottom line
Not reporting null research findings likely reflects competing priorities of scientific journals and researchers. With limited resources and space, journals prefer to publish positive findings and select only the most important null findings. Many researchers aren’t keen to publish null findings because the effort required to do so may not ultimately be rewarded by acceptance of the research into a scientific journal.
There are a few opportunities for researchers to publish null findings. For example, the Journal of Articles in Support of the Null Hypothesis has been publishing twice a year since 2002, and the Public Library of Science occasionally publishes negative and null results in its Missing Pieces collection. Perhaps a newly announced prize for publishing negative scientific results will spur researchers to pay more attention to this kind of work. The 10,000 Euro prize, initially aimed at neuroscience, is being sponsored by the European College of Neuropsychopharmacology’s Preclinical Data Forum.
For many researchers, though, the effort required to publish articles in these forums may not be worth the lift, particularly since the amount of effort required to write up a positive study is the same as for a null study.
The scientific community could benefit from more reporting of null findings, even if the reports were briefer and had less detail than would be needed for peer review. I’m not sure how we could accomplish that, but would welcome any ideas.
Reporting null findings
*Required
Anupam B. Jena, MD, is an economist, physician, and associate professor of health care policy and medicine at Harvard Medical School. He has received consulting fees from Pfizer, Hill Rom Services, Bristol Myers Squibb, Novartis Pharmaceuticals, Vertex Pharmaceuticals, and Precision Health Economics, a company providing consulting services to the life sciences industry.
You may want to note the Publication Prize announced recently by the ECNP preclinical data forum:
http://www.ecnp.eu/preclinical-data-prize
It may indeed be absurd that such measures need to be tried (https://twitter.com/bengoldacre/status/930884187799937026) but it is only when the scientists are supported and stimulated from every angle, we will see high-quality null results being timely and properly disclosed.
Null findings are valuable, but they do not by themselves prove the null hypothesis. A null finding might mean the positive causal connection we’re looking for dos not exist, but it might also mean the experimental design was not adequate to unearth the positive reality that remains hidden. Even a very well-designed study carried out meticulously could prove inadequate. For example, a study that compared anyone who had ever had one drink of alcohol to lifelong teetotalers might find no significant difference in rates of cirrhosis, whereas a study that looked only at heavy drinkers would show the link, and also tell us the first study failed because of its design. The examples of null findings in this article are all taken as evidence that the null hypothesis is true. However, these null findings might also mean the study design was inadequate and missed an important truth. The problem is that if later there is a study with a positive finding, it renders the null findings meaningless, or at least suspect. This might be another reason researchers are less motivated to publish null findings.