Methodology
Overview
This project develops a tridimensional diagnostic framework for CNS diseases, incorporating AI-powered annotation tools to improve interpretability, standardization, and clinical utility. The methodology integrates multi-modal data, including genetic, neuroimaging, neurophysiological, and biomarker datasets, and applies machine learning models to generate structured, explainable diagnostic outputs.
Workflow
We Use GitHub to Store and develop AI models, scripts, and annotation pipelines.
- Create a GitHub repository for AI scripts and models.
- Use GitHub Projects to manage research milestones.
We Use EBRAINS for Data & Collaboration
- Store biomarker and neuroimaging data in EBRAINS Buckets.
- Run Jupyter Notebooks in EBRAINS Lab to test AI models.
- Use EBRAINS Wiki for structured documentation and research discussion.
1. Data Integration
EBRAINS Medical Informatics Platform (MIP).
Neurodiagnoses integrates clinical data via the EBRAINS Medical Informatics Platform (MIP). MIP federates decentralized clinical data, allowing Neurodiagnoses to securely access and process sensitive information for AI-based diagnostics.
How It Works
Authentication & API Access:
- Users must have an EBRAINS account.
- Neurodiagnoses uses secure API endpoints to fetch clinical data (e.g., from the Federation for Dementia).
Data Mapping & Harmonization:
- Retrieved data is normalized and converted to standard formats (.csv, .json).
- Data from multiple sources is harmonized to ensure consistency for AI processing.
Security & Compliance:
- All data access is logged and monitored.
- Data remains on MIP servers using federated learning techniques when possible.
- Access is granted only after signing a Data Usage Agreement (DUA).
Implementation Steps
- Clone the repository.
- Configure your EBRAINS API credentials in mip_integration.py.
- Run the script to download and harmonize clinical data.
- Process the data for AI model training.
For more detailed instructions, please refer to the MIP Documentation.
Data Processing & Integration with Clinica.Run
Neurodiagnoses now supports Clinica.Run, an open-source neuroimaging platform designed for multimodal data processing and reproducible neuroscience workflows.
How It Works
Neuroimaging Preprocessing:
- MRI, PET, EEG data is preprocessed using Clinica.Run pipelines.
- Supports longitudinal and cross-sectional analyses.
Automated Biomarker Extraction:
- Standardized extraction of volumetric, metabolic, and functional biomarkers.
- Integration with machine learning models in Neurodiagnoses.
Data Security & Compliance:
- Clinica.Run operates in compliance with GDPR and HIPAA.
- Neuroimaging data remains within the original storage environment.
Implementation Steps
- Install Clinica.Run dependencies.
- Configure your Clinica.Run pipeline in clinica_run_config.json.
- Run the pipeline for preprocessing and biomarker extraction.
- Use processed neuroimaging data for AI-driven diagnostics in Neurodiagnoses.
For further information, refer to Clinica.Run Documentation.
Data Sources
List of potential sources of databases
Biomedical Ontologies & Databases:
- Human Phenotype Ontology (HPO) for symptom annotation.
- Gene Ontology (GO) for molecular and cellular processes.
Dimensionality Reduction and Interpretability:
- Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC).
- Leverage DEIBO (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts.
Neuroimaging & EEG/MEG Data:
- MRI volumetric measures for brain atrophy tracking.
- EEG functional connectivity patterns (AI-Mind).
Clinical & Biomarker Data:
- CSF biomarkers (Amyloid-beta, Tau, Neurofilament Light).
- Sleep monitoring and actigraphy data (ADIS).
Federated Learning Integration:
- Secure multi-center data harmonization (PROMINENT).
Annotation System for Multi-Modal Data
To ensure structured integration of diverse datasets, Neurodiagnoses will implement an AI-driven annotation system, which will:
- Assign standardized metadata tags to diagnostic features.
- Provide contextual explanations for AI-based classifications.
- Track temporal disease progression annotations to identify long-term trends.
2. AI-Based Analysis
Machine Learning & Deep Learning Models
Risk Prediction Models:
- LETHE’s cognitive risk prediction model integrated into the annotation framework.
Biomarker Classification & Probabilistic Imputation:
- KNN Imputer and Bayesian models used for handling missing biomarker data.
Neuroimaging Feature Extraction:
- MRI & EEG data annotated with neuroanatomical feature labels.
AI-Powered Annotation System
- Uses SHAP-based interpretability tools to explain model decisions.
- Generates automated clinical annotations in structured reports.
- Links findings to standardized medical ontologies (e.g., SNOMED, HPO).
3. Diagnostic Framework & Clinical Decision Support
Tridimensional Diagnostic Axes
Axis 1: Etiology (Pathogenic Mechanisms)
- Classification based on genetic markers, cellular pathways, and environmental risk factors.
- AI-assisted annotation provides causal interpretations for clinical use.
Axis 2: Molecular Markers & Biomarkers
- Integration of CSF, blood, and neuroimaging biomarkers.
- Structured annotation highlights biological pathways linked to diagnosis.
Axis 3: Neuroanatomoclinical Correlations
- MRI and EEG data provide anatomical and functional insights.
- AI-generated progression maps annotate brain structure-function relationships.
4. Computational Workflow & Annotation Pipelines
Data Processing Steps
Data Ingestion:
- Harmonized datasets stored in EBRAINS Bucket.
- Preprocessing pipelines clean and standardize data.
Feature Engineering:
- AI models extract clinically relevant patterns from EEG, MRI, and biomarkers.
AI-Generated Annotations:
- Automated tagging of diagnostic features in structured reports.
- Explainability modules (SHAP, LIME) ensure transparency in predictions.
Clinical Decision Support Integration:
- AI-annotated findings fed into interactive dashboards.
- Clinicians can adjust, validate, and modify annotations.
5. Validation & Real-World Testing
Prospective Clinical Study
- Multi-center validation of AI-based annotations & risk stratifications.
- Benchmarking against clinician-based diagnoses.
- Real-world testing of AI-powered structured reporting.
Quality Assurance & Explainability
- Annotations linked to structured knowledge graphs for improved transparency.
- Interactive annotation editor allows clinicians to validate AI outputs.
6. Collaborative Development
The project is open to contributions from researchers, clinicians, and developers.
Key tools include:
- Jupyter Notebooks: For data analysis and pipeline development.
- Example: probabilistic imputation
- Wiki Pages: For documenting methods and results.
- Drive and Bucket: For sharing code, data, and outputs.
- Collaboration with related projects:
- Example: Beyond the hype: AI in dementia – from early risk detection to disease treatment
7. Tools and Technologies
Programming Languages:
- Python for AI and data processing.
Frameworks:
- TensorFlow and PyTorch for machine learning.
- Flask or FastAPI for backend services.
Visualization:
- Plotly and Matplotlib for interactive and static visualizations.
EBRAINS Services:
- Collaboratory Lab for running Notebooks.
- Buckets for storing large datasets.