Skip to Main Content

A little knowledge may be a dangerous thing — but what happens when the problem is too much knowledge?

There are currently more than 27 million citations on PubMed®, with thousands of new citations added daily. All of these publications add to the overall knowledge-base in science and medicine. But the pace of publication has a dark side: data overload.

All of this publication makes it difficult, if not impossible, for researchers and practitioners to keep up with new discoveries — even within a narrowly defined field. These human limitations add to the lag time between published research and changes in clinical practice. They also make it hard to find hidden connections within the knowledge base that could lead to valuable new insights and guide the direction of future research.

This is where artificial intelligence can help. Advanced algorithms utilizing machine learning and natural language processing can tame the tsunami of scientific research and make large knowledge databases useful and usable for both researchers and clinicians.

Machine learning is a type of artificial intelligence that uses classification algorithms to allow computers to detect patterns in large data sets. Natural language processing enables the program to “understand” and extract knowledge from unstructured sources and documents written in natural (human) language, such as scientific papers. Together, these algorithms can automate the process of collecting, filtering, comparing, and synthesizing knowledge from large corpora of scientific publications.

Battelle developed a program called Sematrix™ to enable natural language query of large document sets such as MEDLINE. The program uses an additional inference layer (similar to the logic statement “If A=B and B=C than A=C.”) to make connections between knowledge from different sources. For example, researchers applied the program to predict that certain bacteria would be resistant to an antibiotic based on their genotypes, even though no single document in the knowledge-base made the claim directly.
Machine learning and natural language processing significantly reduce the time and human energy required to manually search through large corpora, making the knowledge contained in them more accessible. As these technologies become more widely used, they will help users find relevant research to drive policy and practice, identify promising connections to guide research priorities, and point to new answers to complex medical and healthcare challenges.

Visit battelle.org/health-analytics to learn more about how sophisticated analytical methods can drive improvements in healthcare quality.