Skip to main content

Building the foundations for Federated Healthcare Research

On 7 May 2026, the INDICATE training session about the Extraction, Transformation and Load (ETL) process within the INDICATE project took place! The session was given by Celia Alvarez-Romero, María Parra Rodríguez-Armijo and María González-Lopez.

The programme is designed to support data providers in the implementation of ETL processes and in following the Common Data Model (CDM) within the INDICATE infrastructure effectively, securely, and in a fully standardised way. It helps participants, such as clinicians and data engineers, build both the conceptual understanding and practical skills needed to work with interoperable health data.

This training session helped participants better understand the INDICATE data architecture, including the dual Common Data Model (CDM) approach. Participants also learned more about the technical requirements, tools and skills needed to successfully implement ETL processes in their organisations.

During the session, the importance of good data preparation was explained. Before starting the ETL process, healthcare data must be organised, checked and prepared correctly. The Data Provider Handbook supports organisations with practical guidance and explains the minimum technical and procedural requirements needed to transform local intensive care data into the OMOP Common Data Model (OMOP CDM).

The session also explained how the federated approach in INDICATE works. Data stays safely stored at each hospital or organisation and is not transferred to a central database. This helps protect patient privacy and supports secure collaboration between partners across Europe.

Participants also followed the INDICATE data workflow from local ICU source systems to data ready for federated use. The session explained how information from clinical environments can be identified, extracted, transformed through ETL logic, loaded into a local OMOP CDM instance and checked before analysis. HL7 FHIR was presented only as an optional support layer for structured access or interoperability when available or useful, while the main transformation pathway focuses on preparing local ICU data in OMOP CDM. The workflow also highlighted the importance of the local environment, where execution, semantic alignment, validation and governance controls come together before data can support distributed analyses.

The ETL tooling section showed how this workflow can be translated into practical implementation steps for Data Providers. These steps include confirming local readiness, profiling source data, defining mappings, implementing transformation logic, loading the local OMOP CDM, running quality checks and refining the process when issues are detected. The tools were presented as part of an iterative workflow rather than isolated components, supporting profiling, mapping, vocabulary alignment, implementation and post-load validation. This helped clarify how INDICATE moves from architecture to execution, turning complex ICU data into reliable, comparable and analysis-ready resources.

The INDICATE Training programme on Data Model & Data Enablement consists of five sessions. The next and last session will take place on June 17, 14:00–16:00 CEST.

Towards a standard concept recommendation for Federated Intensive Care research: The INDICATE Data Dictionary

On April 20, Boris Delange (MD, Medical Informatics, Université de Rennes) presented the INDICATE Data Dictionary at the OHDSI Europe Symposium 2026, the yearly meeting of the European OHDSI community, gathering researchers, clinicians and data scientists working on federated health data using the OMOP common data model.

Through a live demo, Boris showed how the Data Dictionary supports multidisciplinary teams working with medical concepts in OMOP — by providing peer-reviewed, versioned concept sets enriched with clinical context and ETL guidance.

Why does this matter? Before any federated analysis can run across hospitals, each site must map its local data to the same shared vocabulary. This “concept mapping” step is essential but notoriously time-consuming, and small inconsistencies between sites can silently bias results. By offering a curated, transparent library of ICU concept sets – with review workflows, semantic versioning, and expert comments – the INDICATE Data Dictionary makes this step faster, more reliable, and easier to share across institutions.

The session also sparked valuable discussions on integration within the OHDSI ecosystem and on extending the approach beyond ICU.

The Data Dictionary is open source and runs entirely in the browser: no server, no login, no install. Anyone can browse the 300+ ICU concept sets, propose reviews on GitHub, or deploy their own instance in minutes.


Explore the INDICATE Data Dictionary via GitHub.

Trust over rules and regulation: what really determines success in health data ecosystems

The first session of the INDICATE Training Programme on Legal Framework kicked off! These sessions are running in parallel with and complementing the ongoing Data Models sessions.

This training programme provides participants a comprehensive understanding of the INDICATE legal framework, covering General Data Protection Regulation (GDPR) and European Health Data Space (EHDS) principles. The sessions focus on data protection and privacy-enhancing technologies, governance and rulebook structures, and practical skills to navigate data access processes, requirements, and organizational challenges within INDICATE. 

The first session, led by Ricard Martínez Martínez (Universitat de Valencia), explored how law, technology, and organisation together shape the use of health data in research. A key message was that rules are important, but it is not enough on its own. Trust and responsibility are just as important.

Several important topics were discussed:

  • Trust and reputation matter in research. It is not only about following the law. Organisations must also show clearly that they handle data in a responsible way. Without trust, collaboration and research can be at risk.
  • Clear governance and roles are essential. In complex data systems, it must be clear who is responsible for what. Researchers, project managers, data protection officers, and platform teams all have different roles. Without clear responsibilities, risks increase.
  • Anonymisation is more complex than it seems. Making data truly anonymous is not easy. It requires continuous risk assessment, technical measures, and careful monitoring to reduce the risk of re-identification.
  • Balancing innovation and privacy. In healthcare research, there is a need to learn from data while protecting patient privacy. Data minimisation does not mean using as little data as possible, but using the right data for a clear purpose.
  • Privacy-enhancing technologies are becoming essential. New technical solutions allow researchers to analyse data without directly accessing raw data. This helps protect privacy while still enabling valuable research.

The session also highlighted that European developments, including the European Health Data Space and emerging AI regulations, will strongly shape how health data research is organised in the coming years. Active engagement from the research community is therefore essential.

A key takeaway from the session is that legal rules and regulations, technical design, security, and ethics are deeply connected. Real progress in health data research happens when all of these elements work together, supported by a strong culture of responsibility and trust.

The INDICATE project receives funding from the European Union’s Digital Europe Programme under Grant Agreement number 101167778.

The OMOP Common Data Model explained: speaking the language of health data

The second training of the INDICATE Training Programme on Interoperability, OMOP and Vocabularies took place on April 9, 2026. The programme is designed to support data providers in using the INDICATE infrastructure effectively, securely, and in a fully standardised way. It helps participants, such as clinicians and data engineers, build both the conceptual understanding and practical skills needed to work with interoperable health data.

During this second training, led by Maxim Moinat (Researcher, Medical Informatics, Erasmus MC) and moderated by Boris Delange (MD, Medical Informatics, Université de Rennes), participants learned that data from different hospitals and institutions must be made interoperable to enable research at a European level. However, this is only possible when data is structured in a way that makes comparison meaningful and reliable.

This is where standardisation becomes essential. Without a shared structure, data remains fragmented across systems, making large-scale analysis difficult or even impossible. By harmonising data into a common format, researchers can generate evidence that is consistent, reproducible, and scalable across countries.

The OMOP Common Data Model provides exactly this; a shared way of organising patient data and a shared vocabulary for describing clinical events, so that hospitals across Europe can describe the same reality in the same terms. Maxim walked participants through the main building blocks of the model and showed how they apply to ICU data, with concrete examples detailed during the session. He also presented the wider OHDSI community and European networks such as EHDEN and DARWIN EU, which already federate data on hundreds of millions of patients.

Maxim then walked participants through the full journey from raw hospital data to interoperable, OMOP-formatted data, step by step, from the initial exploration of the source system to the final validation of the mapped database. At each stage, he introduced the corresponding tools from the OHDSI ecosystem, a suite of open-source resources designed to support data providers throughout the process. He also showed how the INDICATE Data Dictionary, presented in Session 1, fits into this journey by guiding data providers on which clinical concepts to prioritise for mapping.

The session concluded with key take-home messages on the importance of clear mapping specifications, vocabulary alignment, and the value of a shared data model for enabling collaborative research across institutions and countries.

Overall, the training provided participants with both a conceptual and practical understanding of how the OMOP CDM and the surrounding OHDSI ecosystem support interoperable and scalable health data research within INDICATE.

The next training will focus on the ETL Workflow, data preparation requirements, and data quality expectations and is planned on May 7 2026. 

Read more about the first training.

INDICATE Training Programme – Legal Framework

In order to support all consortium members in using the INDICATE infrastructure effectively, correctly, and securely, we are organizing a three-session series of the INDICATE Training Programme on Legal Framework, running in parallel with and complementing the ongoing Data Models sessions. 

The programme will give participants a comprehensive understanding of the INDICATE legal framework, covering GDPR and EHDS principles, data protection and privacy-enhancing technologies, governance and rulebook structures, and practical skills to navigate data access processes, compliance requirements, and organizational implementation challenges within INDICATE.

Session dates

All sessions will be held from 14.00 – 16.00 (CEST) via Zoom.

  • May 4 – Session 1 | Understanding GDPR
  • June 24 – Session 2 | Understanding and using Data Access
  • September 10 – Session 3 |  Understanding the Rulebook and legal onboarding steps

Vacancy: Statistician / Applied Mathematician (INDICATE Project)

Position Overview

AP-HP Assistance publique – Hôpitaux de Paris, a valued partner for the INDICATE project, is seeking a highly motivated Statistician / Applied Mathematician / Data Scientist to contribute to the development and validation of predictive models of organ failure in critically ill patients. The position is part of the European INDICATE project and focuses on translational research at the interface between medicine, statistics, and artificial intelligence.

Scientific Scope

INDICATE focuses on predicting major organ failures in ICU patients using multimodal data (clinical, biological, and high-frequency physiological signals). The goal is to identify early predictive signatures of organ dysfunction (renal, respiratory and cardiovascular) and support personalized decision-making in critical care.

Methodological Framework

The candidate will implement and validate advanced statistical and machine learning models, including supervised learning, time-series modeling, and trajectory analysis. Key aspects include feature engineering from high-frequency data, handling missing data, model calibration and discrimination assessment, and external validation when available.

Required skills

  • Strong background in statistics, applied mathematics, or data science
  • Experience in predictive modeling and machine learning
  • Programming skills: Python (mandatory), SQL; Java/C++ is a plus
  • Interest in biomedical applications and clinical data

Contract and Conditions

  • Fixed-term contract (18 months)
  • Full-time (100%)
  • Location: INSERM U942, Paris (AP-HP / Université Paris Cité)
  • English required; French not mandatory

Application process

To apply for this position, please send your CV and motivational letter to contact Dr. Benjamin Deniau via benjamin.deniau@aphp.fr and Ms. Fatima Zunara via fatima.zunara@aphp.fr.

Dr. Benjamin Deniau
benjamin.deniau@aphp.fr

Fatima Zunara
fatima.zunara@aphp.fr