Just as an Apple Watch monitors your fitness and ChatGPT can customize a healthy meal plan for your specific needs, biomedical researchers could soon be able to leverage the power of artificial intelligence for research and patient treatment. Scientists might, for example, ask a conversational interface whether a hypothesis is feasible, or how they can best adjust designs for experiments. A clinician could use an AI assistant to make a prediction about the efficacy of a specific treatment course, or potential mutations within specific tumors.
These are just some of the proposed applications of “virtual cell models,” and more specifically, an AI-powered platform for biology built by the Chan Zuckerberg Initiative (CZI) that could make those predictive cells accessible for scientific research through what experts are calling the “platformization of digital biology.”
Speaking at a fireside chat at the 2024 STAT Breakthrough West Summit in San Francisco, panelists discussed what virtual cells are, why building a digital platform for biology will lead to new discoveries about health and disease, and how this all connects to CZI’s mission to help scientists cure, prevent, or manage all diseases by the end of this century.
Here’s what they discussed.
What are virtual cell models?
By CZI’s definition, virtual cell models will eventually be trained on data across cell biology with the goal of predicting the behavior of healthy and diseased cells.
In theory, these simulations could predict hypothetical outcomes — how cells act by themselves, in relation to one another, how they react to stimuli like new drugs or the environment, and how they exist in a dynamic universe of genomes, transcriptomes, and proteomes. These virtual cells allow for infinite testing scenarios at a lower risk and cost compared with in vivo or in vitro work.
Researchers, clinicians, and patients all care about what these cells could tell us, but for different reasons. That’s why CZI is working to make this “virtual cell platform” openly accessible to the wider scientific community for research — effectively building a digital platform for biology that researchers worldwide can interact with for in silico experimentation and discovery.
Three-part throughline: Data, models, computing
Since 2016, CZI has built a robust in-house technology team drawing on the expertise of computational biologists, software engineers, product managers, and researchers. Working hand in hand with their robust network of grantees, they’re now turning this capacity-building muscle to take on new AI challenges. Part of constructing this solid foundation has included soliciting external advisors, hiring key talent, creating infrastructure, and partnering with in-house AI residents. Moving forward, they’re focusing on three main areas: leveraging existing biological data and generating new data, building advanced models, and establishing a computing infrastructure — all powered by collaboration.
The data part has been in motion for years, as CZI has curated and aggregated data from almost 100 million cells through Chan Zuckerberg CELL by GENE — an open source platform that allows scientists to access, analyze and annotate high-dimensional single-cell data. Virtual cells will also be trained on resources created by the Chan Zuckerberg Biohub Network like the cell atlas Tabula Sapiens and protein atlas OpenCell. CZI intends to capitalize on this momentum through collaborations that build on that existing data stockpile, said Theofanis Karaletsos, head of AI for science at CZI.
“When we want to build large-scale machine learning models of complex things, the gas for this will be data,” he said at the STAT fireside chat. “And no individual entity will be able to collect the right or sufficient data to do this … In collaboration with the broader community, we can bring about the data sets that will be shared that will be both diverse and representative.”
Advanced modeling requires tapping into CZI’s in-house science technology expertise and collaborating with academia and AI experts to train models that bring together different modalities — including sequencing and imaging data and datasets from existing scientific literature — that can be used to measure and describe cells. In the earliest days of this work, CZI will focus on training models on key modalities — data from sequencing, imaging, and literature — to help expand our understanding of biological data.
There’s also computing. In September 2023, CZI announced its commitment to building a high-performance computing cluster made up of more than 1,000+ GPUs. When complete, it will become one of the largest of its kind dedicated to nonprofit life science research.
“In many cases, researchers today are limited … because of a lack of access to necessary compute,” Patricia Brennan, vice president of science technology at CZI, said at the fireside chat. “We want to reduce those barriers to access.”
If all three pieces fall in line, the virtual cell platform could become a workable reality. But even when it’s done, it won’t be complete. CZI will continue to validate and iterate on the tool with external experts and partner organizations.
Dedication to open science
Across CZI and the Chan Zuckerberg Biohub Network, grantees, partners, and investigators are encouraged to work with collaborators across disciplines and publish results on preprint servers such as bioRxiv and medRxiv – an effort that CZI has funded for years. That dedication to open science as a means to build on prior research more quickly is a unique approach in the era of AI.
“We wouldn’t have had the breakthroughs we’ve had in machine learning in recent years without open science,” Karaletsos said in a STAT Brand Studio multimedia lounge interview during the Breakthrough West event. “So for us, it’s a very natural way to work that typically leads to faster duration times and faster breakthroughs.”
For this virtual cell project, CZI aims to embrace openness in several ways — by sharing data, making the predictive models of cells openly accessible to the community, and collecting user feedback on the platform. And that’s just what’s in the roadmap now, Brennan said. As future pivots occur, CZI’s commitment to sharing its findings with the world will remain.
“It’s really important to us … that whatever we fund, whatever we build, that it’s open and accessible to the broader community because that is the way that new discoveries will be built on science,” she said.
Thinking hard about making it easy
If all this virtual cell work sounds complex — the data analysis, the computing, the modeling — it is. Yet it should be intuitive for researchers to use an AI-powered platform, Karaletsos said.
“When we have these complex systems that will abstract complex behaviors and cells and tissues and we want them to be used for biomedical applications, we need to make it very natural for researchers to interact with them, to feel like research assistants really,” noted Karaletsos.
In its efforts to accelerate scientific discovery, CZI looks ahead to a future where doctors and researchers have universal access to this tool and can use it to predict, diagnose, and treat disease.
“We [intend to] aggregate knowledge and integrate models so that not just computational biologists, but all researchers [and clinicians] can really drive insights faster and do analysis at a greater pace,” Brennan said.
To learn more, explore the inner workings of CZI’s virtual cells.
![]()