Skip to Main Content

By Xiaoren Chen, PhD, Medical NLP Informaticist, Optum

The complexity of cancer biology makes the disease an extremely formidable foe for anyone who takes it on, from bio researchers and drug developers, to payers and providers. Traditionally, the treatment options have been surgery, radiation therapy, and chemotherapy. However, thanks to recent cancer treatment breakthroughs such as molecular targeted therapy and immunotherapy, we now know that cancer is more treatable than originally believed. The expanded and innovative use of tumor biomarkers to generate clinical evidence has been a key to gaining this understanding and developing more efficient, cost-effective treatments.

Vital signs

Tumor biomarkers include a spectrum of biological substances that change in response to cancer conditions. They can be proteins, DNA/RNAs or patterns of gene expression, and are often relevant to more than one cancer. Some tumor biomarkers are made by normal cells as well as tumor cells, but they are produced at much higher levels in a tumor. They can provide valuable information about a cancer, such as how aggressive it is, whether it can be treated with a targeted therapy, or whether it is responding to treatment.1 Clinically, tumor biomarkers are very useful to help detect, diagnose and manage cancers. They can also help researchers build patient cohorts for studies.

  • For prevention, tumor biomarkers are used for screening of high-risk populations. One example is BRCA1/BRCA2 mutation screening of women with family history of breast/ovarian cancer.
  • For diagnoses, tumor biomarkers can aid in staging and prognosis evaluation. For example, HER2/neu positive breast cancer is more aggressive than HER-negative type. Meanwhile, the American Joint Committee on Cancer (AJCC) Cancer Staging Manual, Eighth Edition, has added hormone receptors (ER/PR) and HER2/neu status for breast cancer staging.2

Tumor biomarkers can provide the clinical evidence to help doctors choose efficient and cost-effective treatment. For treatment, some tumor biomarkers help identify patients most likely to respond to targeted therapy or immunotherapy. For example, imatinib targeting leukemia patients who are BCR-ABL positive, BCR-ABL used to monitor treatment efficiency and tumor recurrence.3

Biomarkers with diagnostic, prognostic and therapeutic values include proteins or genes that are key drivers in the tumor signaling pathways, and proteins (or genes) that are important for DNA repair and cell cycle control.

Real complements

Because tumor biomarkers can provide so much relevant and meaningful information, researchers are trying to find creative ways to identify and develop them. One powerful approach is mapping cancer biomarkers from — and incorporating them into — de-identified, real-world data (RWD) that includes electronic health record (EHR) and claims information.

On its own, RWD can provide researchers with remarkable insights into a treatment’s effectiveness and a patient’s experience. When that data is enhanced with tumor biomarkers, so are the insights. At the same time, the RWD can fill gaps in information where biomarkers fall short, presenting a much more complete picture of a cancer’s progression and how it reacts to certain therapies.

The goal of this mapping process is to create standardized tumor biomarker data for research use by bringing unmapped cancer biomarkers into the RWD as structured fields. After a biomarker is chosen to be mapped, the first step is to do the research and investigation using the criteria below.

Criteria to research for certain biomarker:

    • i. No-expression/re-expression
    • ii. Overexpression/down-expression
    • iii. Mutation/translocation/rearrangement
    • iv. Epigenetic regulation, such as methylation/histone deacetylation
    • v. Relevance to the cancer

Once all the relevant information is obtained, it’s time to create the concept and map the structured lab data to it.

Criteria to consider when creating a new concept:

  • The abnormality being tested at protein level or gene level or both
  • Available methods to detect the abnormality
  • Specificity and sensitivity of the methods
  • Available data types/units/specimen sources
  • Distribution of the data 

“To really improve the success rate and improve the efficiency of drug development, we need a whole new generation of biomakers that are more informative and that can tell developers earlier whether or not their drug may have toxicity or it really may not work at all, and to get that early read on is what’s going to be successful.”4

– Janet Woodcock, M.D. Director of the Center for Drug Evultion and Research (CDER), FDA

At the outset, the raw data comes from EHR and claims data. The most important entities for lab data include local name, local code, local result, local units, specimen source, normal range, collection and report date. Once the concept is mapped, all the fields are added to the monthly build standardized entities, including:

  • Mapped code
  • Mapped name
  • Mapped units
  • Normalized value

The standardized tumor biomarker data is then transferred into an internally kept de-identified data repository and, finally, to the commercial data products.

Moving forward

As we have seen, tumor biomarkers are valuable assets in cancer care and management. Still, evaluating the impact of a cancer treatment is much more complicated than simply testing the tumor biomarker status. No universal tumor biomarker presently exists and the identification of a single biomarker that can accurately diagnose and predict disease progression may be unlikely. Additionally, tumor biomarkers have not been found for every type of cancer, and not every patient with a certain cancer will have the biomarker associated with that cancer. Some tumor biomarkers can be elevated in benign conditions as well. Finally, ethical challenges always exist regarding cancer treatment, and those need to be considered when utilizing tumor biomarkers for clinical practices. All of this underscores the importance of continuing to develop new biomarkers and using them in conjunction with other information, such as RWD.

Because oncology biomarker results are oftentimes a key component of cohort development for drug research studies, the mapping of additional patient biomarkers increases the sample eligibility for the cohorts. This, in turn, can improve the validity, accuracy and precision of study outcomes. In general, tumor biomarkers are personalized data that must be evaluated in the context of a patient’s history, symptoms and other test results. When they are, they can play a critical role in the efficient development — and best usage of — effective cancer treatments.

Using the approach outlined in this paper, Optum has created a list of more than 100 oncology-related biomarker concepts standardized across its de-identified data set. To learn more about this data set, visit

1. National Cancer Institute. Tumor Markers. Last reviewed May 6, 2019. Accessed May 13, 2020.
2. American Joint Committee on Cancer. Updated Breast Chapter for Eighth Edition. Cancer Staging Manual. Last updated August 13, 2018. Accessed May 14, 2020.
3. National Cancer Institute. How Imatinib Transformed Leukemia Treatment and Cancer Research. research/progress/discovery/gleevec. Updated April 11, 2018. Accessed May 14, 2020.
4. What Are Biomarkers And Why Are They Important? March 16, 2017. Accessed May 13, 2020.