Factor This: The Art and Science of Harmonizing Lab-based Clinical Insights to Make them Analytics Ready
- February 25, 2020
- Posted in Lab Testing Insights
This post was last updated on March 6, 2020 at 10:56 am

An Interview with Rob Lorusso, Director of Quality Assurance, Prognos Health.
Q: Many stakeholders in healthcare are familiar with and study sources of Real World Data (RWD) like prescription and medical claims, but fewer leverage laboratory records. Why?
Claims are created for financial purposes; they are submitted from providers, such as physicians, hospitals, and pharmacies, to payers for reimbursement. They use standard forms and standard coding practices. This standardization makes it easy for stakeholders that acquire this data to mine it and do different types of analyses.
Laboratory data is different. Its purpose is not financial, but clinical (informing a physician of their patients’ laboratory test results). There is no industry standard for labs to follow for these records. Even within a lab system, different forms and formats are used. For many stakeholders, the inconsistencies make working with raw lab data too difficult; they can’t easily draw out the value.
Q: Can you provide more detail about the variances you see in the laboratory data?
A: The data each lab produces is regulated in terms of the precision and accuracy of the test procedures and the test results. Such regulations and industry protocols ensure that the test results are accurate — but they do not impose any form of standardization in terms of how the information is captured or communicated.
Because every lab does things differently, we have a humorous saying here: “Once you know how one lab does things…you know how one lab does things.”
Rob Lorusso, Director of Quality Assurance, Prognos Health
And the inherent variation is made worse by the fact that many of the largest, nationwide lab-testing companies have grown by acquiring smaller regional labs, each of which had its own way of capturing and communicating lab results.
Because there are no rules about “making the data consistent,” we see tremendous variability in terms of completeness and consistency across the various data fields that are populated or not populated in clinical lab-testing data. Errors within individual data fields will mean that key-word searches would miss a lot of important clinical details. Examples include misspelling of clinical terms (such as “cardiac” or “hemoglobin A1C”), variations in the units (for instance, using scientific versus logarithmic), different reference ranges being used to depict the test results, or cryptic notes (for instance, noting “MI” for a patient might mean “myocardial infarction” or “Michigan”).
Complicating matters further, not all lab-testing results are captured as numeric values. Consider, for instance, what happens in anatomic pathology labs. When a biopsy or tissue sample is sent to a pathologist for examination, the findings are typically dictated or typed using human language. Such non-numeric data is generated “straight from the physician’s mouth” and does not necessarily follow any consistent or standardized language rules or naming conventions.
Q: So, what types of transformations does Prognos Health do to make the data usable for analysis?
Every day we receive clinical lab-testing results from national, regional, academic and other lab-testing company partners – that’s millions upon millions of data points. And, over the years, as we’ve built the largest registry of lab records, we’ve created different methods for ensuring that records look the same and are complete. We typically describe the work we do as harmonizing, cleaning, standardizing, enriching, and interpreting the lab data.
Examples of how Prognos transforms raw lab data:
Harmonize | Create one standard format for the lab data received from our many lab partners |
Clean | Combine or delete duplicate records, remove extraneous or incorrect information |
Standardize | Ensure common reporting of ICD diagnosis codes, test LOINC codes, result units, and test names |
Enrich | Fill incomplete data, such as physician information |
Interpret | Understand the meaning of results. Indicate if results are normal or abnormal or convert numeric results to High, Medium, Low, etc. |
Much of the work we do can’t be done in a vacuum, meaning that a cell of the data can’t often be looked at alone. In order to understand and make sure the transformations are appropriate, you have to consider the nuance and context of the results by reviewing the metadata – descriptive data that accompanies the information.
Q: Can you give me an example of why nuance or context is so important?

A: Context is particularly important when it comes to lab-testing results. Consider a patient for whom a biomarker test has been ordered to confirm the presence or absence of the BRAF biomarker. You absolutely need the metadata to know whether that test was ordered for a tissue sample that was taken from the patient’s lung (in which case the results would help to direct the selection of a biomarker-directed oncology therapy for non-small-cell lung cancer) versus the patient’s ear (whose results would then direct the selection of a biomarker-directed therapy for melanoma). In this example, you would need additional data analytics to extract information from the physician’s notes to help connect the dots.
Similarly, think about the potential confusion if you see two sets of disparate test results for the same patient, but the date is not clear. If one presumes that the tests were performed on the same date, it could lead one to simply dismiss one value as an outlier data point or error.
By contrast, knowing that the tests with very different values were performed say, six months apart would provide an important indication that the patient was failing on a particular therapy and consequently they may need a different medication or a second- or third-line therapy to be prescribed.
Q: Tell me about the technologies Prognos utilizes to make the clinical insights reliable.
A: As noted, lab-testing data from the majority of the leading and academic testing labs is inherently messy and non-uniform. There can be literally 4 million different ways to represent a combination of the essential attributes — the name of the test ordered, the name of the test result, the reference unit and so on.
Our platform employs proprietary algorithms, modeling and data-analytics techniques (based on artificial intelligence, natural language processing, and machining learning) to impose “complexity reduction” across all of the data as it is entering our HIPAA compliant system, and this reduces the initial variation to 7,000 possible standard representations. These iterative data-conditioning efforts really take a mix of brute force and finessing to standardize the relevant data fields, detect anomalies and be able to transform aggregated raw data from our lab partners into refined, highly useable insights.
Specifically, we’ve developed more than 65 automated functions — built from roughly 160,000 different rules, developed over the past ten years or so — to clean up and maximize the utility of the 115 data feeds we receive every day from our diverse network of lab-testing partners. And because we use the latest big data and analytic tech stack this all happens in near real time, so there is no meaningful lag time between when the data reaches us and when it is available for our clients – or Prognos – to leverage. The patient information is de-identified in keeping compliance with HIPAA and in some cases HITRUST.
We also leverage Natural Language Processing (NLP) to capture rich information buried in transposed text – such as a Pathologist’s findings – within the lab-testing results which would otherwise be lost – unless, of course, humans spent countless hours reviewing and that’s not very practical or cost effective.
Bottom line, we’re trying to leverage our expertise and technological advancements to ensure that laboratory tests and results are used with the industry’s increasing amount of de-identified patient data to best understand patient journeys and improve outcomes.
Ready for clinical insights that are analytics ready? Contact us today.