
Waste and inefficiency in drug development are big problems. They can be hard to spot, especially when you are in the midst of the process. A new way of visualizing clinical trials might help.
Some experts believe that as much as 85% of biomedical research may be wasteful due to biases in study design, lack of publication, unnecessary duplication, or investigating questions of little importance. It is also estimated that only about one (or maybe two) of every 10 drugs that enter into clinical testing will turn out to be effective.
This rather depressing picture of productivity in pharmaceutical research has led to much head-scratching among scientists, ethicists, and policy researchers who are interested in making the enterprise more efficient. The medical journal the Lancet once ran an entire series of articles devoted to discussions of how to reduce waste and improve the value of research. This series included suggestions on how to improve prioritization, study design, regulation, access, and the quality of scientific reporting.
While these statistics about waste are illuminating, and the suggestions offered for improvement are reasonable, they present a very high-level picture of the research and development enterprise — what might be called the 30,000-foot view. But there may be important patterns or properties of the research enterprise that can be seen only by zooming in and adopting something more like a bird’s-eye view.
Here’s a proof-of-concept for the bird’s-eye view that uses ClinicalTrials.gov data and the AERO graphing method to analyze all of the registered clinical trials from 10 large pharmaceutical companies — AbbVie, Bayer, Gilead, GSK, Johnson & Johnson, Merck, Novartis, Pfizer, Roche, and Sanofi — over past 20 years or so.
The full result is a giant graph showing 13,749 trials that included more than 6 million patients. It is so big that it doesn’t fit neatly on STAT’s website. But you can see and interact with it in the frame below, explore the figure on its own dedicated website, or see a video of the entire graph.
Here’s a guide to the graph: Every node corresponds to a registered trial from one of the 10 companies; clicking on any node opens the trial’s registration page. Trials are then organized from oldest to newest on the x-axis and by patient population or disease under study on the y-axis. The disease areas are clustered according to the National Library of Medicine’s Medical Subject Heading (MeSH) tree. The color of a node indicates the company (Pfizer is dark blue, Johnson & Johnson is orange, Merck is dark green, and so on). The shape indicates the trial’s status — completed studies are circles, recruiting studies are triangles, terminated studies have an “x” through them — and the size represents the number of patients enrolled.
What can this bird’s-eye perspective show us that the 30,000-foot view does not? On first glance, it reveals the incredible volume of research that has been undertaken. These trials represent an enormous social and scientific investment, likely costing several hundred billion dollars — and this is the activities of just 10 companies, a small piece of the total clinical research activity.
It also raises questions about what we have learned and what we should test next, questions that even those of us who study clinical trials and research policy would not typically think to ask.

For example, the image above shows just a few rows of the graph that cover viral diseases: herpes zoster, influenza, rotavirus, and malaria. It’s immediately apparent that these are some of the largest trials across the entire landscape. The two trials in the early 2000s by Merck in green and GSK in pink each enrolled more than 60,000 participants; and GSK’s two active malaria trials are each enrolling more than 50,000. It’s also apparent that one company, GSK, is clearly the most active, and Sanofi in light blue and Merck in green are really the only two other companies that have done comparable trials. This raises the question: Why are so few of these companies in this space?
This contrasts with the figure below, which shows four rows in the neurological space: Parkinson’s, Alzheimer’s, migraine, and meningitis. The trials here tend to be smaller, the largest typically only having a few thousand participants. There are also more of the 10 companies conducting trials in this space, although meningitis is clearly a less active area for investigation than the other three.

It is also interesting to note the gaps in activity. What happened in migraine research after 2010? Why have these 10 companies initiated so few migraine trials since then? By contrast, the activity in Alzheimer’s research looks steadier — although it has more terminated trials.
But the Alzheimer’s terminations are dwarfed by those in the cardiovascular space (eight rows of which are shown below). It’s a rainbow of large trials, many of which have been terminated. Even though trial termination is not necessarily a bad thing — some trials are terminated early because the experimental drug is found to be highly effective — trials are not designed with the intention of being stopped early. So, a cluster of terminations in a particular area demands an explanation.

One of the many great things about ClinicalTrials.gov is that some trialists have already provided those explanations. In the interactive map, you can click on the large dark blue Pfizer trial from 2013 and see that it was terminated because “the emerging clinical profile and the evolving treatment and market landscape for lipid-lowering agents” indicated “that [the study drug] was not likely to provide value to patients, physicians, or shareholders.” The light blue Sanofi trial in 2006 was terminated “due to demands by certain health authorities,” whereas the light green Novartis trial in heart failure in 2006 was terminated because of a clear signal that the drug worked.
Part of what is powerful about this bird’s-eye perspective is that after seeing and interacting with the data on this scale, questions about the stories, patterns, and outcomes that characterize the research enterprise as a whole become more concrete and accessible. You can immediately see a pattern of large, terminated trials in the cardiovascular space. With a few clicks of the mouse, you can start to understand why. This could be invaluable for those who want to improve the efficiency of the research enterprise, since it allows them to generate insights about what has happened and then specify questions for further investigation, such as “Are the opportunities for re-thinking how we plan and execute trials for cardiovascular diseases?”

I think it is also interesting to see how the landscape of cancer research (above) differs from that of other disease areas. The figure below is a representative excerpt of the cancer trials — activity is spread across many rows because, unlike Alzheimer’s and Parkinson’s, cancer is not one disease. The average cancer trial looks to be getting smaller over time, which could be explained by the shift to toward targeted or precision medicines. And all the companies are somewhere in this space, although there are some cancer types that only have one or two active companies.
Now view the entire graph through an ethics lens. Every single study — every node in this figure — should be ethically justified by an assumption that it will offset its social costs and the burdens it imposes on the research subjects by producing valuable gains in scientific knowledge. That leads to this question: For every trial in this graph, why does (or did) it make sense given the surrounding R&D landscape? For trials recruiting now (the triangles), are these the right studies to be doing? In a truly efficient and ethical research enterprise, the sponsors and investigators who run trials should always have compelling answers to those questions, and for any answer to be compelling, it must take into account the surrounding R&D context — a context this visualization makes more explicit.
The bird’s-eye view also helps sharpen questions about who has the power and responsibility to intervene and alter the patterns in this space. Generally speaking, researchers and clinical investigators — who are often the target of interventions in the literature on research waste — may not have much power to significantly change the landscape. Patient power is probably even more limited, although both patients and investigators could organize to advocate for big changes in research policy.
Small biotechnology companies probably also don’t have the ability to pivot easily from one disease or product to another if they saw, for example, that activity in their area of interest was becoming oversaturated.
But government regulators, big pharmaceutical companies, investors, and large research funding organizations do have some ability to survey, intervene, and thereby reshape the research landscape. Regulators, for example, could step in to require changes to trial designs if they see frequent trial terminations in a particular disease area. The ability to identify patterns and better predict what is coming next (for one’s own company as well as competitors) seems obviously of interest to big pharmaceutical companies and investors. For large research funding organizations, being able to identify gaps where work isn’t being done could be a valuable insight for maximizing the impact of the organization’s scientific investments.
These observations just scratch the surface of what new ways of visualizing the clinical trial enterprise can reveal and the actions that can emerge from the insights they offer. Although I have focused on pharmaceutical companies, this same method could be used to compare and contrast the portfolios of activities of large national research funders or academic research institutions. It could also be used to gain insights about equity in health research by looking to see how trial activity aligns with the burdens of disease.
This proof of concept is based on ClinicalTrials.gov, but future iterations could incorporate more data on trial outcomes, patents, regulatory approvals, and more — drawing on the Cochrane Library or DrugBank.ca, for example. Such a resource could be a boon to experts who conduct systematic reviews or produce evidence-based practice guidelines by giving them a fast and intuitive tool to identify and analyze the relevant populations of trials.
One final thought: The science and technology that emerged, and is emerging, from these trials, touches all of us in some way. I believe this bird’s-eye perspective, presented with an interactive visualization, can empower all of us to better understand trials by revealing the geography, the forests, and the trees of clinical testing, and helping us start to imagine new possibilities for how we might shape the landscape to better serve all of our interests.
Spencer Phillips Hey, Ph.D., is faculty member and co-director of research ethics at the Harvard Center for Bioethics and a research scientist in the Program on Regulation, Therapeutics, and Law at Brigham and Women’s Hospital in Boston.
While maybe an interesting exercise in data visualization across 5 dimensions (timeline, disease area, company, trial size, trial status), the chart falls short of generating any actionable insight, since:
– the data is not curated properly with a specific purpose, and
– the chart lacks many other key dimensions required for uncovering any sound insight on the larger clinical development landscape.
Key shortcomings any trialist will immediately point to are:
– Lack of distinguishing therapeutic vs. prophylactic treatments (which will affect trial size as pointed out earlier, as well as pharma presence in that disease area)
– Lack of distinguishing clinical trial phases (small Ph1 trials will dominate this chart, Ph3 trials will obviously be bigger, etc. )
– Lack of distinguishing pre-approval vs. post-marketing trials (many Ph4 trials are conducted in response to FDA PMRs, and do not necessarily reflect company clinical development strategies at that stage)
– Lack of distinguishing interventional vs. observational trials (it is not clear whether retrospective observational trials in electronic databases are filtered out here, which again would skew any interpretation on trial size)
– Lack of consistent medical ontology on Y-axis (disease areas & disease symptoms are mixed in some instances)
Additionally, it would help to provide a lit survey as an opener since visualizing clinical trials is not a novel endeavor, with many R packages available on GitHub, as well as previous publications (http://courses.ischool.berkeley.edu/i247/s14/reports/ClinicalTrialsBrowser_Gerber_Odisho_Ost.pdf)
The author really needs to separate out drug studies from vaccine studies. The viral targets highlighted in the article imply that they are unnecessarily large compared to those trials for miscellaneous diseases that follow. These large trials are for vaccines that, by their very nature of being preventative versus therapeutic, need to enroll larger numbers of subjects in order to effectively “prove the negative” that no new cases of the infection are occurring or that no rare safety events happen in the healthy participants to whom the experimental vaccine is given.
This is very different than a drug trial that can enroll patients such that you know the symptoms they have, they take the drug and you measure that the symptoms are relieved.
To compare apples to apples, these types of trials should be separated
These are independent companies, publicly held and responsible to shareholders. While your observations enable a “press release” or small article, not much more can be derived from this assembly of data. Where to invest resources by therapeutic area, is a decision made by individual companies, unless the government wants to entice ($) companies to do otherwise. It’s like tracking the individual decisions of all human beings over a year. While “cool, big data”, nothing actionable will arise. Good luck, just sit back and watch…
I think this is unfair and may also miss the point. Yes these activities are not connected formally but is that not even more interesting? What can you infer from studying independent activities? You seem to be implying ‘nothing’. I disagree. If you re-read the article from a position of what patterns are arising from all of these companies employing probably some of the smartest people in their respective areas and investing hugely (so we can infer they must have deduced there is potentially x10+ times the return on the investment) you might see something more than “cool, big data”.
“… the image above shows just a few rows of the graph that cover viral diseases: herpes zoster, influenza, rotavirus, and malaria. ”
Correction needed: malaria is not a viral disease.
I work for a CRO and I can assure you small-med biopharma make very poor and irrational decisions that waste an immense amount of time and money. It’s a race against time but a thorough analysis of what they have learned up until the now could provide a much more fruitful and cost effective future for these companies. This is not a fast process but they cut every corner they can to try and make it one. Including hiring of competent individuals to manage their projects. I’m by no means a veteran in this field but Im falling more in love with data science as it’s obvious who is acting upon their experience rather than what they have learned from the study thus far.