- January 22, 2021
- Posted in Clinical
By Olalekan Aboubakar, Vice President of Data Science and Research, and Bill Bowman, Clinical Solutions Architect
Prognos isn’t a company of products, it’s a company of experts who are passionate about pushing boundaries to improve patient outcomes through analytics. The combination of our cutting edge data scientists and our team of clinical experts who embed clinical knowledge into the data science results in finding what we call the Clinical Truths™.
Working together our data scientists and clinical experts are leveraging natural language processing (NLP) and machine learning (ML) to inform predictive models and uncover the Clinical Truths that are evident from analyzing vast amounts of harmonized data. The developments these teams have made over recent months and years have increased the speed to insight and provided scalability for life sciences companies seeking insights from lab and other clinical data.
The problem with lab data
To truly improve healthcare and make the most informed decisions, it is important to factor in all of the data born from the patient journey. Lab results are the most influential dataset, but they pose significant challenges.
Lab data is different from other healthcare data. By nature it is incredibly unstructured, difficult to manage, and nearly impossible to analyze at scale. Lab test results are intended to be read by a human for analysis of a single patient.
In order to extract value from these data sets, data from different sources must be normalized or harmonized so that clear patterns, or Clinical Truths, can be uncovered across data sets. Because there is no mandated industry standard for data that labs can follow, different forms and formats are frequently used across labs and even within a lab. While LOINC codes are intended to support this effort at normalization, often labs use proprietary codes that must be mapped back to LOINC in order to standardize across lab tests. In addition, labs frequently miscode results or, in some cases, there may not be a code for a lab test. Even once the codes are standardized, the challenge of interpreting the results remains significant.
Lab results and their representation can be complex. Even for simple data, different units can be represented for the same test and different acronyms or representations can be used for the same results. All of this must be standardized in order to extract value from the data. Healthcare companies do not have the infrastructure to do this at scale, but it’s central to what Prognos does.
What the data reveals
We sometimes think of the data that comes in from different sources as being different dialects of the same language. It is our job to act as the interpreter for these varying dialects. While some of this work is done manually, we have built models that can perform this work by identifying conceptual links between different data sets.
Once the data is harmonized, the value can be extracted. Overtime, the data starts to reveal certain patterns – showing that a patient who has disease X with a specific comorbidity, and who undergoes treatment Y, tends to follow a similar patient journey. These patterns, or Clinical Truths, are incredibly valuable in understanding the patient journey and improving patient care.
To extract these Clinical Truths, historically a clinician would look through the data for a given test and map the results for a specific condition. With this approach the outcomes were only as good as the mapping, took several weeks to complete, and required clinicians to iterate and add to the data by hand. Additionally, the results only provided insights for one specific condition and set of circumstances. This process made scaling nearly impossible.
Machine learning meets disease-specific clinical knowledge
To solve this challenge, our data scientists are leveraging machine learning to identify these patterns and reveal the Clinical Truths. The use of ML increases the speed to insight and allows for massive scalability.
Initially this process required a clinician to map the disease – taking approximately four to five weeks – before feeding it into the model which learned the patterns and was able to generalize to different stages of a single disease or similar diseases. Having advanced the modeling, the machine can now generate the initial interpretations and mapping, and a clinician can review them for accuracy – shortening the process to one week. While the clinician remains a critical piece of the process, the model makes the process faster and produces results that are easier to evaluate – at scale.
These historically informed, self-refining algorithms and models have allowed us to scale in two ways. First, it allows us to use all of the data we currently have in our data lakes without requiring a clinician to constantly oversee and tweak the model. Second, for data we have not seen before – data from a new lab test or brought by a client from their own data platform – our generalized solutions are able to use our existing modeling techniques to determine the expected ontological mapping of the disease or test. By scaling our efforts, we have been able to generalize the types of interests a client may have and build modeling that can interpret at that level. For instance, we can go deeper than just the presence of a genetic mutation to look at what it is, where it is on a gene, and more.
A unique piece of the development of this model was the use of natural language processing to add clinical specificity to the results. Using NLP, our data scientists and clinicians taught the machines to read pathology notes, including learning healthcare acronyms, reading typos, and even learning new words. The ML needed to be smart enough to understand when different acronyms mean the same thing, or when one acronym has different meanings depending on the context. And, because new terms are being introduced all the time, we are currently working on a patent for a model that can deconstruct a new word from pathology notes to determine and interpret its meaning. The ability to interpret and utilize pathology notes with the lab data allows for greater specificity when using Cohort Designer.
Through the collaboration of data scientists, clinicians and machine learning, the Clinical Truths that are revealed empower clients with the actionable insights on patient journeys they need to make their work faster and more impactful. And, the clinical profiles identified through this process can be purchased through Prognos to fast track the insights a company needs related to a specific disease or patient journey.
A complex example: PD-L1
Evaluating the PD-L1 positivity status of a patient’s cancer in concert with clinical guidelines can be complex. The PD-L1 protein is expressed on the surface of normal cells and some types of cancer cells, and is also a therapeutic target for many tumor types. It is a particularly complex biomarker to evaluate for several reasons: First, there are different tests based on different molecular clone types. In addition, the rules for interpretation of results vary based on the tumor in question and the way in which results are measured because different markers can be used for reporting and the condition-specific levels of those different markers may vary.
To explain further, PD-L1 can be expressed as tumor cells (TCs), a tumor proportion score (TPS), immune cells (ICs) or a combined positive score (CPS, which combines TCs and ICs). The application of a particular method is often a function of the underlying cancer and PD-L1 test results may be reported qualitatively (e.g., “positive”) or quantitatively (e.g., a TPS of 50). Further, the assessment of PD-L1 positivity varies depending on the tumor in question, and the results for a given cancer type may be measured multiple ways. PD-L1 testing can also employ several different molecular clones, each of which may measure results differently. The number of commercially available PD-L1 tests that are in widespread use – many of which are designed for specific tumor types – plus in-house test kits developed by laboratories, means that the tests deployed can also vary widely. Plus, PD-L1 testing for a given tumor type is oftentimes conducted with a test specifically designed for an entirely different form of cancer.
As an example, PD-L1 testing in non-small cell lung cancer (NSCLC) can measure results as TCs, TPS, or ICs and the interpretation will vary according to the specific test and clone used. The PD-L1 test in use may also be designed for an entirely different application, such as bladder cancer. In this instance, the result may be measured and reported as a CPS, which is typically not used in NSCLC, and if the result is reported as “positive,” it becomes extremely difficult to understand what this means clinically with respect to treatment.
As a result, to properly evaluate results for a complex disease such as cancer, it is essential to first identify the patient’s diagnosis (which can be ascertained from lab data (e.g. diagnosis codes, specimen type, etc.). The test result can only be properly interpreted in the context of that condition. In this example involving PD-L1 testing, it is essential to consider the overall context and all of the various nuances in order to extract the relevant data and evaluate the results that are important to the diagnosis in question. For instance, in our example, it is essential to not only interpret the results of a PD-L1 test based on what they see in the results section of the report, but to also review the pathology notes and other relevant information in order to have the full context.
This complex example is played out over and over with lab data and underscores the importance of collaboration between data scientists and clinical experts to ensure the models created are interpreting testing and results in the proper context.
Clinically significant insights at scale
Using decades of experience with harmonizing lab data, we have made it easy to extract clinically significant insights from healthcare data and have made these insights available at scale thanks to a combination of data science, clinical expertise, and technology.
These advancements are part of an evolution in the industry that increasingly demands data insights with faster speeds, greater specificity, and improved accuracy. As the gold standard for lab data in the life sciences market, Prognos is able to rapidly and efficiently integrate lab data within minutes and provide an additional layer of knowledge and understanding for a variety of different use-cases. As the capabilities and technology continues to advance, we expect to see the market continue to increase its efficiency and effectiveness at leveraging these Clinical Truths to improve patient outcomes.