Work Package 2

Common Data Models

Objectives

Define a set of flexible Common Data Models related to the INDICATE use cases (WP6) in support of data interoperability and data federation.
- - This objective will contribute to using the benefits of an architecture based on Data Spaces Blueprint to define common data spaces in coordination with the other INDICATE Work Packages.
  - In particular, the Common Data Models will be focused on data generated from: clinical examination and examination protocols (EHR notes), physiological monitors, laboratory results, imaging, therapeutics (medication and surgical interventions)

Select terminologies and standards for semantics and interoperability for the above data.

Specify the requirements and provide guidance for data pre-processing (ETL operations), interoperability and quality and privacy control for secondary use of health data for research and innovation, including a minimum viable and full data interoperability and integration framework.

Specify the requirements and provide guidance for EHR-integration of (AI-based) decision support applications.

Provide data pseudonymisation and anonymisation strategies so that data providers are in full compliance with GDPR, national laws, and institutional policies.

Tasks

T ask 2.1 | Define minimal datasets

Task 2.1 defines the must-have data and accompanying meta-data should be available as a minimum entry level for a data provider, prioritised according to the roadmap for clinical use cases (WP6), and define what data is should-have or nice-to-have. This task includes the following subtasks:

T2.1.1 Create an inventory of data items in information systems for ICU patients for the demonstrator projects in WP6, prioritising T6.1 MIMIC-EU, T6.5 Quality Benchmarking Dashboard, and T6.6 Grand Rounds Workspaces.
T2.1.2 Identify the minimal data required to conduct the demonstrator projects in WP6.
T2.1.3 Create an INDICATE data dictionary and analyse the extent to which data providers within the consortium have the raw data readily available.

Task 2.2 | Create and publish the INDICATE Data Provider Handbook

This task will create the INDICATE Data Provider Handbook, a high-level description of the INDICATE-Common Data Models, semantic and syntactic standards, and minimal dataset utilised within the infrastructure. The common data models are based on existing, interoperable common data models, notably HL7 FHIR and OMOP for structured health data. The task will build upon existing standards for interoperability, such as LOINC for clinical laboratory parameters, ATC codes for medication, SNOMED-CT for diagnoses, procedures, and observations, and DICOM for images. This task includes the following subtasks:

T2.2.1 Create a detailed description of the collection of INDICATE Common Data Models and vocabulary standards, with reference to existing resources.
T2.2.2 Design and optimise de-identification methods for ICU data de-identification, summarised and described for implementation in the Data Provider Handbook.
T2.2.3 Design and publish an example ETL pipeline design to map source data to the INDICATE Common Data Models, for either a HL7 FHIR or OMOP implementation (or both) utilising the source code repository service developed in WP4 and output of T2.2.1.
T2.2.4 Design a test framework and suite for the ETL procedures, including technical performance and quality indicators for anonymisation.
T2.2.5 Design a data quality assessment framework for the INDICATE Common Data Models based on the data quality assessment model as implemented in OHDSI for the OMOP common data model.
T2.2.6 Implement examples with dummy data for the de-identification and ETL components so that potential and new data providers have end-to-end examples that can be used for either training or adaptation to their local setting.
T2.2.7 Create open source ETL code repositories, based on the work in WP4, along with template pipelines to support on-boarding of new data providers, including Python and R- libraries that can be used to optimise ETL, harmonise de-identification processes, and verify and validate the interoperability of the INDICATE common data models across the infrastructure.
T2.2.8 Publish the INDICATE Data Provider Handbook as a digitised living document in the knowledge platform developed in WP5.

Task 2.3 | Design test framework and suite for the ETL procedures of high frequency monitor data

This task defines a data acquisition protocol for a vendor agnostic ETL process for complex monitor data. It will be done in close collaboration and in line with the priorities of WP6 and, in particular T6.1. MIMIC-EU. Philips MDIP captures, aggregates, and sends vital signs, alarms, settings, and waveforms from bedside medical devices such as monitors, ventilators, anaesthesia machines, infusion pumps, dialysis, etc. that use serial or ethernet connection. All supported medical devices are interfaced via serial connection to Philips bedside hubs (Axon/Neuron), or directly to the MDIP server if these are ether-capable devices. Philips MDIP is also able to integrate directly with monitoring gateways/central stations. The solution features a data management layer where data be sampled, specific data points selected, units converted, etc. to match the needs of the receiving system. The solution supports the standard HL7 v2.3 and IHE v2.6 format depending on the requirements of the receiving system. As part of our deliverables, we will deliver a specific outbound to match the specific IHE and anonymisation requirements when sending data to the central repository. It entails the following subtasks:

T2.3.1 Selection of data acquisition sites for low and high-resolution, multi-parameter physiological waveform databases.
T2.3.2 Definition of the acquisition equipment and processes which are based on the use of a Medical Device Information Platform and a data warehouse for physiological data recording and exporting software which extracts all data, including all physiological signals, monitor-generated vital sign measurements from the patient monitors that are connected to an ICU central unit.
T2.3.3 Specification of data reformatting and the de-identification/anonymisation procedure. If data includes several forms of protected health information, such as the patient’s name, date of birth, medical record number, and the date of the recording, this information needs to be removed or replaced with non-identifying information.
T2.3.4. Definition of the data collection scope including the number of ICU beds, Telemetry beds, PICU or NICU beds.

Task 2.4 | Implement the ETL process and promote data to the infrastructure

All data providers will utilise the available components of INDICATE Data Provider Handbook to implement the ETL process in their local setting so that they can promote fully anonymous data to the federated infrastructure for ICU data. This task is based on the proven ETL process utilised in the OHDSI community and the IMI EHDEN initiative. It entails the following subtasks:

T2.4.1 Summarising the source data,
T2.4.2 Creating the high level ETL design,
T2.4.3 Creating the detailed ETL design,
T2.4.4 Mapping source vocabularies and data model to INDICATE CDMs (either HL7 FHIR, OMOP or both),
T2.4.5 Implementation of the automated ETL pipeline using the R packages or Python libraries published in T2.2.7,
T2.4.6 Performing data quality self-assessment and disclosing the reports to the Data Provider Support Workgroup.
T2.4.7 After quality control and approval by the Data Provider Support Workgroup, the data provider can promote metadata to the centralised metadata catalogue.

Task 2.5 | Explore an extension of the OMOP CDM for ICU data for high frequency monitor data

This task lays the groundwork for further development of the infrastructure and wider implementation across healthcare. One of the unique aspects of ICU is the pervasiveness of high frequency time-series data. Currently, OMOP CDM v5.4 has limited support for such data. An extension of the OMOP CDM for time-series data would be a big step towards increased harmonisation and utilisation of time series data for research and innovation. To propose an extension to the OMOP CDM we envision the following subtasks:

T2.5.1 Gather requirements from available source data and analytical use cases for high frequency time-series data.
T2.5.2 Perform a gap analysis to compare the analytics needs and available source data to the current OMOP CDM and other proposed extensions for devices.
T2.5.3 Join the OHDSI CDM working group to propose new tables and fields and elicit feedback on our proposal. However, note that the INDICATE implementation of the extension to the standard OMOP CDM can continue regardless of formal acceptance of our extension into a future version of the OMOP CDM.
T2.5.4 Alignment with IEEE 11073 Service-oriented Device Connectivity (SDC) family of standards. SDC will provide for next-generation interoperability among medical devices in acute care settings, enabling a vendor-neutral open ecosystem with new and improved clinical workflows and capabilities. Think of remote / closed-loop device control or silent/smart patient rooms.

T ask 2.6 | Propose a data interoperability framework for the deployment of AI- based decision support tools

The objective of this task is to create a proposal for an FHIR-based interoperability framework to facilitate the seamless deployment of AI-based decision support tools. Highlight how FHIR can facilitate data exchange, interoperability, and the integration of AI solutions. Contribute to a data mapping strategy that aligns with the FHIR standard and OMOP CDM to harmonise ICU data from diverse sources as part of the existing OHDSI community. The task consists of the following sub tasks:

T2.6.1 Provide guidelines for integrating AI-based decision support tools into the ICU data ecosystem, ensuring data compatibility and ethical use.
T2.6.2 Mapping data requirements for model development and model deployment to ensure, utilising the central components for model repositories developed in WP4.
T2.6.3 Gap analysis between development of models and deployment of models.
T2.6.4 Mapping OMOP CDM to FHIR by utilising existing resources and initiatives such as VULCAN guide, CDSS guide, OHDSI WG FHIR-to-OMOP, and vice versa as OMOP is generally not implemented as a near real-time ETL process. Instead FHIR tends to be updated more frequently.
T2.6.5 Suggestions for architecture of model systems as a guidance for AI developers and service users.

T ask 2.7 | Establish a Data Provider Support Workgroup

This task will set up a Data Provider Support Workgroup to offer data providers technical and legal assistance in setting up the ETL process and data quality assurance and data quality control policies. The Data Provider Support Workgroup will also monitor data quality and GDPR compliance after initial on-boarding of data providers. To that end it will collaborate closely with the ELSI experts in WP3. Subtasks through the project include:

T2.7.1 Drafting and implementing a structured monitoring plan to continuously monitor the following KPIs of INDICATE: The number of ICU data providers by EU member state, and the volume and type of data that is made available by data provider.
T2.7.2 Implementing data audit procedures based on those developed in IMI EHDEN and described in the OHDSI handbook.

Lead

Carlos L. Parra-Calderón

Task lead representatives

Celia Alvarez-Romero

Boris Delange

Eduard Butkevych

Maxim Moinat

Jan van den Brand

María González López

Contributors

Bērnu klīniskā universitātes slimnīca Valsts Sia (Children’s Clinical University Hospital)