Methodology

Version 16.1 by manuelmenendez on 2025/02/09 10:08

Overview

This project develops a tridimensional diagnostic framework for CNS diseases, incorporating AI-powered annotation tools to improve interpretability, standardization, and clinical utility. The methodology integrates multi-modal data, including genetic, neuroimaging, neurophysiological, and biomarker datasets, and applies machine learning models to generate structured, explainable diagnostic outputs.

Workflow

  1. We Use GitHub to Store and develop AI models, scripts, and annotation pipelines.

    • Create a GitHub repository for AI scripts and models.
    • Use GitHub Projects to manage research milestones.
  2. We Use EBRAINS for Data & Collaboration

    • Store biomarker and neuroimaging data in EBRAINS Buckets.
    • Run Jupyter Notebooks in EBRAINS Lab to test AI models.
    • Use EBRAINS Wiki for structured documentation and research discussion.

1. Data Integration

EBRAINS Medical Informatics Platform (MIP).

Neurodiagnoses integrates clinical data via the EBRAINS Medical Informatics Platform (MIP). MIP federates decentralized clinical data, allowing Neurodiagnoses to securely access and process sensitive information for AI-based diagnostics.

How It Works

  1. Authentication & API Access:

    • Users must have an EBRAINS account.
    • Neurodiagnoses uses secure API endpoints to fetch clinical data (e.g., from the Federation for Dementia).
  2. Data Mapping & Harmonization:

    • Retrieved data is normalized and converted to standard formats (.csv, .json).
    • Data from multiple sources is harmonized to ensure consistency for AI processing.
  3. Security & Compliance:

    • All data access is logged and monitored.
    • Data remains on MIP servers using federated learning techniques when possible.
    • Access is granted only after signing a Data Usage Agreement (DUA).

Implementation Steps

  1. Clone the repository.
  2. Configure your EBRAINS API credentials in mip_integration.py.
  3. Run the script to download and harmonize clinical data.
  4. Process the data for AI model training.

For more detailed instructions, please refer to the MIP Documentation.


Data Processing & Integration with Clinica.Run

Neurodiagnoses now supports Clinica.Run, an open-source neuroimaging platform designed for multimodal data processing and reproducible neuroscience workflows.

How It Works

  1. Neuroimaging Preprocessing:

    • MRI, PET, EEG data is preprocessed using Clinica.Run pipelines.
    • Supports longitudinal and cross-sectional analyses.
  2. Automated Biomarker Extraction:

    • Standardized extraction of volumetric, metabolic, and functional biomarkers.
    • Integration with machine learning models in Neurodiagnoses.
  3. Data Security & Compliance:

    • Clinica.Run operates in compliance with GDPR and HIPAA.
    • Neuroimaging data remains within the original storage environment.

Implementation Steps

  1. Install Clinica.Run dependencies.
  2. Configure your Clinica.Run pipeline in clinica_run_config.json.
  3. Run the pipeline for preprocessing and biomarker extraction.
  4. Use processed neuroimaging data for AI-driven diagnostics in Neurodiagnoses.

For further information, refer to Clinica.Run Documentation.

Data Sources

List of potential sources of databases

Biomedical Ontologies & Databases:

  • Human Phenotype Ontology (HPO) for symptom annotation.
  • Gene Ontology (GO) for molecular and cellular processes.

Dimensionality Reduction and Interpretability:

  • Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC).
  • Leverage DEIBO (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts.

Neuroimaging & EEG/MEG Data:

  • MRI volumetric measures for brain atrophy tracking.
  • EEG functional connectivity patterns (AI-Mind).

Clinical & Biomarker Data:

  • CSF biomarkers (Amyloid-beta, Tau, Neurofilament Light).
  • Sleep monitoring and actigraphy data (ADIS).

Federated Learning Integration:

  • Secure multi-center data harmonization (PROMINENT).

Annotation System for Multi-Modal Data

To ensure structured integration of diverse datasets, Neurodiagnoses will implement an AI-driven annotation system, which will:

  • Assign standardized metadata tags to diagnostic features.
  • Provide contextual explanations for AI-based classifications.
  • Track temporal disease progression annotations to identify long-term trends.

2. AI-Based Analysis

Machine Learning & Deep Learning Models

Risk Prediction Models:

  • LETHE’s cognitive risk prediction model integrated into the annotation framework.

Biomarker Classification & Probabilistic Imputation:

  • KNN Imputer and Bayesian models used for handling missing biomarker data.

Neuroimaging Feature Extraction:

  • MRI & EEG data annotated with neuroanatomical feature labels.

AI-Powered Annotation System

  • Uses SHAP-based interpretability tools to explain model decisions.
  • Generates automated clinical annotations in structured reports.
  • Links findings to standardized medical ontologies (e.g., SNOMED, HPO).

3. Diagnostic Framework & Clinical Decision Support

Tridimensional Diagnostic Axes

Axis 1: Etiology (Pathogenic Mechanisms)

  • Classification based on genetic markers, cellular pathways, and environmental risk factors.
  • AI-assisted annotation provides causal interpretations for clinical use.

Axis 2: Molecular Markers & Biomarkers

  • Integration of CSF, blood, and neuroimaging biomarkers.
  • Structured annotation highlights biological pathways linked to diagnosis.

Axis 3: Neuroanatomoclinical Correlations

  • MRI and EEG data provide anatomical and functional insights.
  • AI-generated progression maps annotate brain structure-function relationships.

4. Computational Workflow & Annotation Pipelines

Data Processing Steps

Data Ingestion:

  • Harmonized datasets stored in EBRAINS Bucket.
  • Preprocessing pipelines clean and standardize data.

Feature Engineering:

  • AI models extract clinically relevant patterns from EEG, MRI, and biomarkers.

AI-Generated Annotations:

  • Automated tagging of diagnostic features in structured reports.
  • Explainability modules (SHAP, LIME) ensure transparency in predictions.

Clinical Decision Support Integration:

  • AI-annotated findings fed into interactive dashboards.
  • Clinicians can adjust, validate, and modify annotations.

5. Validation & Real-World Testing

Prospective Clinical Study

  • Multi-center validation of AI-based annotations & risk stratifications.
  • Benchmarking against clinician-based diagnoses.
  • Real-world testing of AI-powered structured reporting.

Quality Assurance & Explainability

  • Annotations linked to structured knowledge graphs for improved transparency.
  • Interactive annotation editor allows clinicians to validate AI outputs.

6. Collaborative Development

The project is open to contributions from researchers, clinicians, and developers.

Key tools include:

  • Jupyter Notebooks: For data analysis and pipeline development.
    • Example: probabilistic imputation
  • Wiki Pages: For documenting methods and results.
  • Drive and Bucket: For sharing code, data, and outputs.
  • Collaboration with related projects:
    • Example: Beyond the hype: AI in dementia – from early risk detection to disease treatment

7. Tools and Technologies

Programming Languages:

  • Python for AI and data processing.

Frameworks:

  • TensorFlow and PyTorch for machine learning.
  • Flask or FastAPI for backend services.

Visualization:

  • Plotly and Matplotlib for interactive and static visualizations.

EBRAINS Services:

  • Collaboratory Lab for running Notebooks.
  • Buckets for storing large datasets.