Skip to Main Content

By now, the story is cliché.

You notice an association between a set of baseline observations and a critical outcome. Or you suspect that the inclusion criteria on your next trial, a pivotal Phase 3, will disqualify too many patients. In any case, you need to test your hypothesis. Over months of data sourcing and weeks of integration, you build a fit-for-purpose dataset. But more time goes by as I.T. provisions the space needed to host your data. On a dreary Tuesday afternoon, you get the email: “all loaded.” At last, you can start coding. 

That’s when you discover that a crucial data element is 95% empty. 

Scenarios like these impede every stage of the journey from drug discovery to delivery. Real-world data (RWD) has proven its utility beyond economic and outcomes research. Continuously refreshed and clinically rich RWD holds the power to improve trial design, bolster new drug applications and label expansion studies—even change the minds of payers and providers. But practice and theory are still too far apart. The process of identifying, acquiring, and evaluating the right data, not to mention converting that data into evidence, is rife with failure and delay.

Consider the breaking points. Any researcher who sets out to test their hypothesis wrestles with five questions. Without a prompt and affirmative answer to each, the trek from idea to insight stalls.

Does the data I need exist?

Data aggregators in our space can usually answer this one quickly. But beware the response: “Yes, and it’s perfect!” Every data source has gaps, and a “large n” is no replacement for harmonized, fine-grained observations and lab values. A true partner in your search for the right data will spend only a fraction of their time comparing cohort sizes, and a much greater part interrogating how well the underlying data can answer your question. Ideally, they will give you the tools to define, count, and interrogate along with them.

Can I access it before it’s outdated?

At least two questions are nestled inside of this one. What’s the recency of the data, and how long does licensing take? Claims data alone, for all its breadth and continuity, won’t support close active surveillance of a new treatment’s uptake. There’s simply too much delay in the billing. Even EHR and lab data, for all its immediacy and richness, only stay fresh for so long. Linked data offers the best of both worlds, but add to the process complicated licensing paperwork, and your data ages even before it’s in your hands.

Do I have the tools I need to analyze it?

Maybe. But like the one above, this question isn’t as straightforward as it appears. SQL has been the standard for analysis, but Python and R are expanding statistical and machine learning capabilities. Can your SQL, Python, and R coders work synchronously in the same notebook? Can they generate visualizations for epidemiologists and ops directors who don’t share their background in programming? And what about storage and computational power? Can your team load and analyze terabytes of data without taxing an already overworked I.T. staff? A “no” answer may not be fatal, but it introduces delay at best, and the possibility of mistranslation or data leakage at worst.

Can I share my findings?

Results trapped in an environment only coders can navigate never become actionable insights. But cutting and pasting charts and graphs, or worse, entering summarized data into a spreadsheet, poses dangers no business should accept. Until “sharing” means “inviting collaborators from your organization into the source document” instead of “pushing out a copy,” we are all working at one remove from the truth.

Can I extend my learning?

Good research answers a question. Great research does more. It allows others to repeat or build on the findings. Many analyses, like those focused on drug safety, demand new results as new data becomes available. Others inspire related questions. Is the effect size uniform, or does it vary between sub-cohorts? How well does the model generalize? These questions shouldn’t mean starting over but building on the foundation already laid. Saving the code for those who follow you. Leaving a clear trail of annotations. Importing new data into the original analysis environment with a click. These were once “nice to haves.” But as the volume and velocity of RWD grows, these features are becoming non-negotiable.

The legacy process of real-world evidence generation and its multiple failure points.

Clinician-researchers and drug developers deserve better. Our technology needs to foster autonomy, collaboration, and rapid iteration just as fervently as it ensures data privacy. Progress in secure cloud computing, open source, and the democratization of machine learning have made this possible. What was once just speculation—a single, self-service trusted research environment that unites continuously refreshed RWD with a flexible coding environment—is now within reach.

The right technology can help us break the speed limit on evidence generation.

From months to minutes: that’s the cycle time reduction eClinical innovators need to deliver. Forgive the alliteration. It’s a lot better than the cliché. 

Click here to learn more about TriNetX, LLC and register for their upcoming webinar, Real-World Evidence Generation: A Model Process.

The TriNetX platform hosts only de-identified patient data and is HIPAA and GDPR compliant. For more information, visit our Trust Center.