Onboarding to the Medical Informatics Platform MIP
Onboarding to the Medical Informatics Platform MIP
Step-by-step guidance
What can I find here?
- Creation of a MIP User Account
- MIP Data Governance
- MIP Data Flow
- MIP GDPR compliance assessment
Figure 1: User Interface of the Medical Informatics Platform MIP
Creation of a MIP User Account
Prerequisite – Step 1: Access to the MIP requires an EBRAINS user account, which needs to be permitted and authenticated. EBRAINS user accounts are available to users with a legitimate interest (mainly research and development) from Europe and beyond.
Request an EBRAINS user account: https://www.ebrains.eu/page/sign-up
The EBRAINS user account allows users to directly access the Public MIP (https://mip.ebrains.eu/) with no further accreditation being required.
Access to a specific MIP Federation – Step 2: EBRAINS authorised Users with an active EBRAINS account can request access to a specific MIP Federation by contacting support@ebrains.eu, who will forward the specific request to the MIP Management team. Users can also get in direct contact with the MIP team via the online form on the EBRAINS website: https://www.ebrains.eu/tools/medical-informatics-platform
The Data Science Steering Committee (DSSC) of the specific federation will be involved in the accreditation process to receive access approvals. The creation of a new MIP Federation projects can be initiated at any time.
Users are required to accept the EBRAINS Terms and Policies https://www.ebrains.eu/page/terms-and-policies, to indicate acceptance and compliance with all applicable laws, regulations, rules, and approvals in the use and sharing of the data, including, but not limited to, the General Data Protection Regulation (GDPR).
Upon login to the MIP, users are mandated to accept the Terms of Use of the MIP. Accredited users access the MIP through a web-based interface, which will provide them with direct access to the respective federation on the MIP.
MIP Data Governance
Figure 2: MIP Data Governance Flow
This illustration depicts how data governance and data flow in the MIP are organised and how the legal framework and data management are interlinked. Decision points are indicated.
**The MIP Data Protection Impact Assessment (DPIA) is currently under full revision and will become functional upon final approval by the CHUV DPO. Per Article 35(3)(b) of GDPR a Data Protection Impact Assessment is required for processing of sensitive data.
MIP and data anonymisation
Note: The MIP is handling anonymised data. The definition for anonymisation (ISO standard (ISO 29100:2011)) of personal data is the process of encrypting or removing personally identifiable data from data so that a person can no longer be directly or indirectly identified (see also Recital 26 of the GDPR). As soon a person cannot be re-identified the data is no longer considered personal data and the GDPR does not apply for further use.
However, processing personal data for the purpose of anonymisation is still processing that must have a legal basis under Article 6 of GDPR. The anonymisation process is defined as “further processing” and this processing must be compliant with the principle of purpose limitation. The process of data anonymisation can be used to improve data protection compliance, e.g., as part of the “privacy by design” strategy, with the goal to improve the protection of the processed data; or as part of the “data minimisation” strategy, where data can be anonymised and used without the risk of harming the data subjects.
Both strategies are followed by the MIP.
MIP concepts and definitions
- Common Data Elements (CDEs)
A set of standard variables defined by clinical experts and data scientists, which would be used by researchers to perform analysis on specific medical conditions at the federation level. In the MIP context, we use the term CDEs to refer to the standardised federated datamodels only.
- Data Element
In metadata, the term data element is an atomic unit of data that has precise meaning or precise semantics.
- Datamodel (Metadata)
A Datamodel (Metadata) describes the structure of database variables found in specific extracts of a hospital database, including descriptive metadata, structural metadata, administrative metadata, reference metadata and statistical metadata.
- Database Variables
A variable or scalar is a storage address (identified by an index or address) paired with an associated symbolic name, which contains some known or unknown quantity of information referred to as a value.
- Electronic Health Records (EHR)
Health information and clinical records registered per each patient per visit in the hospital's database (Oracle, SQL, or any other database system) and usually transferred in db or CSV format. EHRs usually contain different levels of data; we might define them in this context as spaces, domain, and sub-domain. For example, a space might include demographics, social status, or patient's medical history as different data domains. On the other hand, EHR contain other data spaces related to the specific medical condition such as Dementia or Epilepsy where each space includes specific domain and sub-domain, such as medical assessments and tests, diagnoses, treatment, and operations, etc.
- Medical Conditions
Diseases are often known to be medical conditions that are associated with specific symptoms and signs.
MIP Data Flow
Figure 3 MIP Data Flow This diagram illustrates the MIP Data Flow, indicating processing steps prior to data upload and steps after data upload to the MIP. EHR – electronic health record, MRI - magnetic resonance imaging, ETL - data integration (extract, transform, load), CDE – common data elements, ML – machine learning, GUI – graphical user interface, VM – virtual machine. Data pre-processing: extract data from EHR records and produce pseudonymised data in .csv format; optional Step1: extract brain volumes from MRI images and merge with data extracted from EHR records; Data Quality and Harmonisation: Prepare CDE: if CDE exists – Steps 2B, 4 and 5 are followed; if CDE needs to be prepared, first Steps 2A and 3A need to be performed, followed by Steps 2B, 4 and 5. Data Analysis and ML: anonymised dataset is uploaded either to the federated node in the institution or the dedicated VM on EBRAINS CSCS. Data Analysis can be performed via the Federation Service Layer and User Interface: use of predefined federated algorithms, aggregated results will be retrieved via the GUI. |
MIP GDPR compliance assessment
Several aspects are crucial for demonstrating GDPR compliance. Hereunder is a compliance assessment based on the GDPR core principles:
Lawfulness, Fairness, and Transparency (Article 5 GDPR)
Lawfulness and Fairness: In alignment with GDPR requirements for lawful processing (Article 6(1)(a)), the MIP legal contracts with Data Providers require that data processing is based on informed consent obtained from data subjects. It requires users to accept the EBRAINS General Terms of Use, adhering to all applicable laws and regulations, including GDPR. Data Transfer Agreements (DTAs) and Data Sharing Agreements (DSAs) provide a legal framework and are mandated before any data transfer or data sharing, ensuring compliance with Article 28(3) regarding processor agreements (GDPR Articles 5(1)(a), 6, and 7). Strict authentication and authorisation procedures are in place, to only provide access to accredited users. Data anonymisation is required before integration in the MIP, which minimises the risk of reidentification, protecting data subjects from potential harm (GDPR Article 6(1)(a)). An additional built in privacy threshold restricts data analysis to receiving aggregate results of at least 10 participant records.
Transparency: The open-source nature of the MIP promotes transparency by providing accessible source code, fostering community involvement, and offering clear information about data governance, federated queries, and data usage without moving original data from its location. Detailed technical and user documentation is available at https://github.com/HBPMedical/mip-docs, an interactive user guide is accessible directly on the platform.
Purpose Limitation
The MIP processes data for specified explicit, and legitimate purposes related to clinical research of each of the MIP Federations (dementia, traumatic brain injury, epilepsy, mental health, and stroke). Data is not moved or downloaded from the platform, maintaining the integrity of the purpose limitation principle (GDPR Article 5(1)(b)).
Data Minimisation
The MIP adheres to the principle of data minimisation by only processing data necessary for the research purposes stated. This includes the use of Common Data Elements (CDEs) to standardise and limit the scope of data collected or re-used. All data is anonymised, minimising the exposure of personal data (GDPR Article 5(1)(c)).
Accuracy
MIP includes tools like the MIP Data Catalogue and the MIP-DQC Tool to help data managers/curators to ensure data accuracy and quality before data is integrated. Data validation and cleaning are integral parts of the data preparation process (GDPR Article 5(1)(d)).
Storage Limitation
Data within the MIP is kept only as long as necessary for the scientific research purposes. The platform’s architecture, which involves retaining data control at the level of the data provider, mitigates the risks associated with long-term storage, supporting compliance with GDPR’s storage limitation principles (GDPR Article 5(1)(e)). Data Providers can at any time decide that a federation is to be discontinued, either based on the time limits set in the legal contracts or at any time this seems to be appropriate.
Integrity and Confidentiality
MIP employs strong authentication, encryption, and a secure VPN for data protection. The federated analysis framework ensures that data remains confidential and is only accessed by accredited users (GDPR Articles 5(1)(f), 25, and 32).
Accountability
Data owners are responsible for ensuring ethical compliance and the integrity of research data. MIP’s governance framework enforces accountability among data controllers and processors by maintaining records of processing activities including legal agreements and ensuring that data controllers and processors adhere to GDPR requirements. (GDPR Article 5(2)).
Data Protection by Design and by Default
The terms of use of the platform ensures that data is anonymised and remains within the original hospital’s control, reflecting a privacy by design approach. Default privacy settings (e.g., aggregation of results) restrict data analysis, strong authentication and accreditation processes enhance the security of MIP's federations, providing a secure environment for data analysis without exposing individual data (GDPR Article 25).
Data Subject Rights
As MIP processes anonymised data, GDPR data subject rights (e.g., access, rectification, erasure) do not directly apply. However, ethical considerations and informed consent ensure that patients’ rights are respected (GDPR Articles 15, 16, 17, and 18, as applicable to the non-anonymised data collection phase). The system's design respects data ownership and control by data controllers, ensuring they can determine accessibility and availability of their data.
Data Transfers (Articles 44-50)
The MIP ensures that any data transfers comply with GDPR’s requirements for international data transfers. This is achieved using DTAs and DSAs, ensuring that data transferred across borders is protected under equivalent data protection standards. If data is transferred, secure file transfer solutions are used.
Summary of legal steps to be followed, depending on the purpose of the processing or project:
- Patient consent for usage of data for research purposes (specific, general, re-use, anonymisation)
- Ethical clearance for research projects and planned processing
- DPIA * under preparation
- Data Transfer agreement or Data Sharing Agreement
- Collaboration Agreement
- MIP User Charter
- MIP Installation Agreement