News

INDICATE: what is federated data analysis?!

Publiced on:

July 14, 2025

What is INDICATE?

Healthcare institutions possess vast amounts of untapped data that could drive innovation and improve patient outcomes. As much as 97% of health data world-wide is not re-used. With INDICATE we aim to unlock this potential by building a federated data infrastructure for Intensive Care Units (ICUs) across Europe.

But how can you analyze data without accessing it directly? The answer lies in INDICATE’s design, specifically in its first development phase due for delivery in August 2025.

What is Federated Data Analysis?

Federated Data Analysis allows researchers to run analyses across multiple ICU datasets without seeing the raw data. It is a code-to-data scenario to facilitate secure computation.

INDICATE’s federated analysis architecture enables data users, such as researchers, innovators, and policy makers, to derive insights from ICU data across multiple institutions, regions, and European countries.

How is INDICATE designed to make Federated Data Analysis possible?

In the beginning of the project, the architecture will consist out of four essential components that work together to enable federated observational studies:

Metadata catalog
Study repository
Secure processing environment
Aggregated results exchange

Metadata Catalog

The metadata catalog is usually the starting point for a researcher who has an idea. It enables researchers to discover what data, standardized and in a common format, is available across participating hospitals without accessing the data itself. Similar to browsing a library catalog that shows book titles and summaries without revealing the books’ contents, researchers can see what types of ICU data exist and where, allowing them to design appropriate studies.

Study Package and Study Repository

The study repository manages the lifecycle of Study Packages. A Study Package is a collection of files, such as a research protocol, scripts, and data requirements that are used to answer a research question. Think of it as a library or GitHub repository where researchers can submit their study proposals. And other researchers and data providers can review the proposals, raise issues, and provide feedback. Once a researcher is happy with their proposal, they obtain ethical approval for the study. Next, the researcher can invite Data Providers to join the study. Data Providers are hospitals that participate in INDICATE and make their data available. The researcher’s institute and the Data Providers will need to enter into a Data Sharing Agreement. This Agreement includes the data processing purposes, the legal basis for processing, and how intellectual property rights are handled. After the Data Sharing Agreements have been signed, the Study Package is locked. It is now ready for download.

Secure Processing Environment

The secure processing environment is where the analysis of personal health data occurs. The health data stays within the hospital setting. This means that the hospital downloads the Study Package from the Study Repository and installs it in their Secure Processing Environment. The Study Package is executed against the hospital’s data. Think of the secure processing environment as a clean room where an analysist performs experiments the researcher told them to do without taking anything outside, except the results.

A data steward at the Data Provider is responsible for downloading the signed Study Package. They verify it is the correct version. Next they install and run the package and check for errors. If the package executed correctly, the data steward verifies the Aggregated Results. It is also their job to verify that there are no personal data remaining in the Aggregated Results. If all checks are passed, the data steward uploads the Aggregated Results to the Aggregated Results Exchange.

Aggregated Results Exchange

The aggregated results exchange enables the secure sharing of aggregated outputs from the analysis. Aggregated results include statistical summaries, such means, standard deviations, frequencies and proportions, but not individual patient data. It is like the airlock to the clean room where the results may be placed for pick-up, without the being able to enter the clean room itself.

After Aggregated Results have been uploaded to the Aggregated Results Exchange, the researcher is notified that new results are available. They can log in and download the Aggregated Results. Next, the researcher can combine all of the Aggregated Results from all participating Data Providers into one pooled result. This pooled result can be used for a scientific paper, for example.

Data Protection, Privacy and Security in INDICATE

INDICATE architecture implements privacy by design principles, maintaining data sovereignty as a core requirement. Personal data of patients always remains within the hospital, and only aggregated results leave the hospital’s environment after review.

This approach aligns perfectly with General Data Protection Regulation (GDPR) requirements and physician-patient confidentiality by ensuring that:

Data providers (hospitals) retain complete control over their data
Personal data remains within the hospital’s security boundaries
Only non-sensitive, aggregated statistics are shared, such as averages, and counts
All processing occurs with appropriate legal bases and safeguards

Real-world applications

INDICATE’s first set of deliverables will result in a Minimal Viable Product that supports observational studies across European intensive care units. This way INDICATE provides immediate value through applications such as comparative effectiveness research and quality benchmarking. In comparative effectiveness research treatment approaches across different hospitals can be compared to identify best practices. Quality benchmarking allows hospitals to compare their performance metrics with peers while maintaining data privacy.

For hospital executives, this means gaining valuable insights without the compliance risks of traditional data sharing. For start-ups developing healthcare solutions, it provides access to a broader evidence base for validating innovations without the hurdles of data transfer agreements.

Moving forward

INDICATE’s Minimal Viable Product (MVP) is scheduled for completion in August 2025. This foundation will later expand to include federated machine learning (Plateau 2) and decision support capabilities (Plateau 3).

By joining INDICATE, partners like healthcare institutions can contribute to and benefit from a European-wide approach to collaborative research while maintaining full sovereignty over their data. The architecture provides the security, privacy, and governance needed to unlock the value of ICU data for better patient outcomes across Europe.

The future of healthcare research lies not in centralizing data, but in federating insights—allowing knowledge to flow while keeping sensitive information secure.

We would love to hear your thoughts on INDICATE’s design, so feel free to reach out to us via email or our LinkedIn page!