3terra DataHub

Streamline medical coding and drive advanced clinical analysis

3terra DataHub gives your hospital new insight from integrated patient data coupled with AI-interpreted clinical text.

3terra DataHub is an integrated data repository of structured hospital data and AI interpreted metadata derived from clinical documentation. It is also an analytical query engine that efficiently retrieves clinical data from this repository.

Clinical Natural Language Processing (NLP) engines recently released by healthcare AI leaders like Microsoft have now made it feasible to generate highly useful data from raw clinical text. The use of this data within practical applications are virtually limitless and include:

To understand how Microsoft’s AI and 3terra DataHub can help you accomplish these goals, we must first provide additional context.

Data analysis in healthcare

Until recently, much of healthcare data analysis and research has been performed against structured data, most often in the form of relational database tables with labeled fields and defined data types. Some of this data is directly generated by clinicians or hospital information systems (HIS), while other times they are manually created by medical coding specialists who translate clinical data to standardized medical vocabularies like ICD-10. These are resource intensive processes that can only realistically be applied to a small subset of clinical knowledge.

The use of unstructured data, such as freeform clinical text, has also been used significantly in big data applications, particularly in the development of applied machine learning algorithms that focus on targeted goals. However, there have always been limits in the ability for algorithms to properly detect context, syntax, and medical terminology. This has been a serious impediment to utilizing clinical notes in a broader capacity. An intermediate translation layer is needed to make full use of this data.

What is AI-interpreted clinical metadata?

Since 2020, several of the largest technology companies in the world have made commercial Application Programming Interfaces (APIs) available to translate clinical text into semi-structured formats. This is an enormously difficult feat of software engineering and data science that has been the product of many years and billions of dollars of research and development.

Clinical text is parsed and represented in a flexible data structure (typically JSON) that includes a wide variety of metadata. In the process, these algorithms need to accomplish several very difficult tasks in order to produce true value. These include:

  1. Clinical concept identification – The AI must be able to label medical concepts to portions of the text in a format that allows identification of a wide variety of medical terminology such as Diagnoses, Procedures, Symptoms, Anatomy, Medication, and a wide variety of additional contextual information. This necessitates the use of standardized vocabularies which is addressed by tools such as the Unified Medical Language System (see next section)
  2. Contextual modifiers – There are syntactic elements in the English language that change the meaning of a clinical snippet. For instance, the AI must understand conditional statements such as “family history may include a history of” and negation “patient refused medication”. Within the generated metadata, these details are represented by confidence scores and negative assertions assigned to the blocks of text.
  3. Relationship extraction – When sections of medical text are contextually related, the AI must be able to detect those associations. For instance, the sentence “patient was intubated at 10:35am” must relate the concept of a procedure (intubation) with the concept of time (10:35am).

The Unified Medical Language System (UMLS)

From the UMLS website, “The UMLS, or Unified Medical Language System, is a set of files and software that brings together many health and biomedical vocabularies and standards to enable interoperability between computer systems.”

When parsing freeform clinical text, we need a vocabulary to use in the labeling process. Where possible, the AI used by DataHub will associate a snippet of text to a UMLS code. This UMLS code can then be used to cross reference the medical term against dozens of medical vocabularies such as ICD-10 and SNOWMED-CT. This layer ensures that clinical concepts from different systems can be merged together in a unified environment. There are certain licensing considerations to be aware of but the use of this database is generally free with few restrictions.

Once this clinical data is parsed, interpreted, and presented in a format that allows interoperability with other medical vocabularies, it opens up the possibility to extract significant new insight. However, this is semi-structured data that is not nearly as easy to utilize as the tabular data sources that healthcare organizations are used to working with.

3terra DataHub

Once we see the implications of utilizing AI generated clinical metadata, it quickly becomes clear that these translated clinical documents must be considered an organizational asset that must be effectively managed. Hospitals now need to focus on the infrastructure to support this new way of thinking about data.

Clinical analytics will no longer just involve querying fields from structured data sources. There are contextual relationships, labeled medical concepts, and semantic elements that need to be handled in the process.

We provide you with the underlying infrastructure to make use of this new clinical metadata. DataHub provides:

The infrastructure to manage the processing of your clinical notes. This can be done on-premises on your server or in the cloud using Microsoft Azure.
An indexed data repository of the raw clinical text along with the AI-generated metadata.
A powerful query engine that allows you to extract value from the semi-structured clinical metadata.
A user Interface for medical professionals to overlay the metadata on top of the raw text to streamline documentation review and reduce cognitive burden.
Integration into our leading coding assistance platform, Data Quality Assist, to streamline the medical coding process.

Any NLP AI is prone to making mistakes and the generated metadata will not have the same level of data quality as other sources, such as clinical abstracts that are created by professional medical coders. This does not negate the value of the data but there are subtleties in its integration and use, a process that we will help you navigate.

As part of our service, we setup the integration and preprocessing that is required to maximize the use of the engine. We can also advise on how to modify your processes and workflows to maximize the use of DataHub.

Starting in March, 2021, we began piloting a suite of new analytical tools that sit on top of the 3terra DataHub engine. This new platform is called Aperture DS and it will be made generally available in early 2022.

Healthcare is now embarking on a new phase of clinical analytics that includes new dimensions and complexities. Since 2008, 3terra has guided Canadian hospitals through many evolutions of utilizing data to drive better decisions and improve operational efficiency. We offer a platform that currently processes over 45% of all medical abstracts in Ontario, along with a mature service delivery infrastructure that ensures our client hospitals’ success.

Please contact us if you have any questions or would like a demo of our platform.