Warning:  Due to planned infrastructure maintenance, the EBRAINS Wiki and EBRAINS Support system will be unavailable for up to three days starting Monday, 14 July. During this period, both services will be inaccessible, and any emails sent to the support address will not be received.

Attention: We are currently experiencing some issues with the EBRAINS Drive. Please bear with us as we fix this issue. We apologise for any inconvenience caused.


Changes for page Methodology

Last modified by manuelmenendez on 2025/03/14 08:31

From version 15.1
edited by manuelmenendez
on 2025/02/09 10:08
Change comment: There is no comment for this version
To version 4.2
edited by manuelmenendez
on 2025/01/29 19:10
Change comment: There is no comment for this version

Summary

Details

Page properties
Content
... ... @@ -1,268 +1,109 @@
1 -== **Overview** ==
1 +=== **Overview** ===
2 2  
3 -This project develops a **tridimensional diagnostic framework** for **CNS diseases**, incorporating **AI-powered annotation tools** to improve **interpretability, standardization, and clinical utility**. The methodology integrates **multi-modal data**, including **genetic, neuroimaging, neurophysiological, and biomarker datasets**, and applies **machine learning models** to generate **structured, explainable diagnostic outputs**.
3 +This section describes the step-by-step process used in the **Neurodiagnoses** project to develop a novel diagnostic framework for neurological diseases. The methodology integrates artificial intelligence (AI), biomedical ontologies, and computational neuroscience to create a structured, interpretable, and scalable diagnostic system.
4 4  
5 -== **Workflow** ==
6 -
7 -1. (((
8 -**We Use GitHub to [[Store and develop AI models, scripts, and annotation pipelines.>>https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/discussions]]**
9 -
10 -* Create a **GitHub repository** for AI scripts and models.
11 -* Use **GitHub Projects** to manage research milestones.
12 -)))
13 -1. (((
14 -**We Use EBRAINS for Data & Collaboration**
15 -
16 -* Store **biomarker and neuroimaging data** in **EBRAINS Buckets**.
17 -* Run **Jupyter Notebooks** in **EBRAINS Lab** to test AI models.
18 -* Use **EBRAINS Wiki** for structured documentation and research discussion.
19 -)))
20 -
21 21  ----
22 22  
23 -== **1. Data Integration** ==
7 +=== **1. Data Integration** ===
24 24  
25 -=== **EBRAINS Medical Informatics Platform (MIP)**. ===
26 -
27 -Neurodiagnoses integrates clinical data via the **EBRAINS Medical Informatics Platform (MIP)**. MIP federates decentralized clinical data, allowing Neurodiagnoses to securely access and process sensitive information for AI-based diagnostics.
28 -
29 -==== How It Works ====
30 -
31 -
32 -1. (((
33 -**Authentication & API Access:**
34 -
35 -* Users must have an **EBRAINS account**.
36 -* Neurodiagnoses uses **secure API endpoints** to fetch clinical data (e.g., from the **Federation for Dementia**).
37 -)))
38 -1. (((
39 -**Data Mapping & Harmonization:**
40 -
41 -* Retrieved data is **normalized** and converted to standard formats (.csv, .json).
42 -* Data from **multiple sources** is harmonized to ensure consistency for AI processing.
43 -)))
44 -1. (((
45 -**Security & Compliance:**
46 -
47 -* All data access is **logged and monitored**.
48 -* Data remains on **MIP servers** using **federated learning techniques** when possible.
49 -* Access is granted only after signing a **Data Usage Agreement (DUA)**.
50 -)))
51 -
52 -==== Implementation Steps ====
53 -
54 -
55 -1. Clone the repository.
56 -1. Configure your **EBRAINS API credentials** in mip_integration.py.
57 -1. Run the script to **download and harmonize clinical data**.
58 -1. Process the data for **AI model training**.
59 -
60 -For more detailed instructions, please refer to the **[[MIP Documentation>>url:https://mip.ebrains.eu/]]**.
61 -
62 -----
63 -
64 -=== Data Processing & Integration with Clinica.Run ===
65 -
66 -Neurodiagnoses now supports **Clinica.Run**, an open-source neuroimaging platform designed for **multimodal data processing and reproducible neuroscience workflows**.
67 -
68 -==== How It Works ====
69 -
70 -
71 -1. (((
72 -**Neuroimaging Preprocessing:**
73 -
74 -* MRI, PET, EEG data is preprocessed using **Clinica.Run pipelines**.
75 -* Supports **longitudinal and cross-sectional analyses**.
76 -)))
77 -1. (((
78 -**Automated Biomarker Extraction:**
79 -
80 -* Standardized extraction of **volumetric, metabolic, and functional biomarkers**.
81 -* Integration with machine learning models in Neurodiagnoses.
82 -)))
83 -1. (((
84 -**Data Security & Compliance:**
85 -
86 -* Clinica.Run operates in **compliance with GDPR and HIPAA**.
87 -* Neuroimaging data remains **within the original storage environment**.
88 -)))
89 -
90 -==== Implementation Steps ====
91 -
92 -
93 -1. Install **Clinica.Run** dependencies.
94 -1. Configure your **Clinica.Run pipeline** in clinica_run_config.json.
95 -1. Run the pipeline for **preprocessing and biomarker extraction**.
96 -1. Use processed neuroimaging data for **AI-driven diagnostics** in Neurodiagnoses.
97 -
98 -For further information, refer to **[[Clinica.Run Documentation>>url:https://clinica.run/]]**.
99 -
100 -==== ====
101 -
102 102  ==== **Data Sources** ====
103 103  
104 -[[List of potential sources of databases>>https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/data/sources/list_of_potential_databases]]
11 +* **Biomedical Ontologies**:
12 +** Human Phenotype Ontology (HPO) for phenotypic abnormalities.
13 +** Gene Ontology (GO) for molecular and cellular processes.
14 +* **Neuroimaging Datasets**:
15 +** Example: Alzheimer’s Disease Neuroimaging Initiative (ADNI), OpenNeuro.
16 +* **Clinical and Biomarker Data**:
17 +** Anonymized clinical reports, molecular biomarkers, and test results.
105 105  
106 -**Biomedical Ontologies & Databases:**
107 107  
108 -* **Human Phenotype Ontology (HPO)** for symptom annotation.
109 -* **Gene Ontology (GO)** for molecular and cellular processes.
20 +==== **Data Preprocessing** ====
110 110  
111 -**Dimensionality Reduction and Interpretability:**
22 +1. **Standardization**: Ensure all data sources are normalized to a common format.
23 +1. **Feature Selection**: Identify relevant features for diagnosis (e.g., biomarkers, imaging scores).
24 +1. **Data Cleaning**: Handle missing values and remove duplicates.
112 112  
113 -* **Evaluate interpretability** using metrics like the **Area Under the Interpretability Curve (AUIC)**.
114 -* **Leverage [[DEIBO>>https://github.com/Mellandd/DEIBO]] (Data-driven Embedding Interpretation Based on Ontologies)** to connect model dimensions to ontology concepts.
115 -
116 -**Neuroimaging & EEG/MEG Data:**
117 -
118 -* **MRI volumetric measures** for brain atrophy tracking.
119 -* **EEG functional connectivity patterns** (AI-Mind).
120 -
121 -**Clinical & Biomarker Data:**
122 -
123 -* **CSF biomarkers** (Amyloid-beta, Tau, Neurofilament Light).
124 -* **Sleep monitoring and actigraphy data** (ADIS).
125 -
126 -**Federated Learning Integration:**
127 -
128 -* **Secure multi-center data harmonization** (PROMINENT).
129 -
130 130  ----
131 131  
132 -==== **Annotation System for Multi-Modal Data** ====
28 +=== **2. AI-Based Analysis** ===
133 133  
134 -To ensure **structured integration of diverse datasets**, **Neurodiagnoses** will implement an **AI-driven annotation system**, which will:
30 +==== **Model Development** ====
135 135  
136 -* **Assign standardized metadata tags** to diagnostic features.
137 -* **Provide contextual explanations** for AI-based classifications.
138 -* **Track temporal disease progression annotations** to identify long-term trends.
32 +* **Embedding Models**: Use pre-trained models like BioBERT or BioLORD for text data.
33 +* **Classification Models**:
34 +** Algorithms: Random Forest, Support Vector Machines (SVM), or neural networks.
35 +** Purpose: Predict the likelihood of specific neurological conditions based on input data.
139 139  
140 -----
37 +==== **Dimensionality Reduction and Interpretability** ====
141 141  
142 -== **2. AI-Based Analysis** ==
39 +* Leverage [[DEIBO>>https://drive.ebrains.eu/f/8d7157708cde4b258db0/]] (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts.
40 +* Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC).
143 143  
144 -==== **Machine Learning & Deep Learning Models** ====
145 -
146 -**Risk Prediction Models:**
147 -
148 -* **LETHE’s cognitive risk prediction model** integrated into the annotation framework.
149 -
150 -**Biomarker Classification & Probabilistic Imputation:**
151 -
152 -* **KNN Imputer** and **Bayesian models** used for handling **missing biomarker data**.
153 -
154 -**Neuroimaging Feature Extraction:**
155 -
156 -* **MRI & EEG data** annotated with **neuroanatomical feature labels**.
157 -
158 -==== **AI-Powered Annotation System** ====
159 -
160 -* Uses **SHAP-based interpretability tools** to explain model decisions.
161 -* Generates **automated clinical annotations** in structured reports.
162 -* Links findings to **standardized medical ontologies** (e.g., **SNOMED, HPO**).
163 -
164 164  ----
165 165  
166 -== **3. Diagnostic Framework & Clinical Decision Support** ==
44 +=== **3. Diagnostic Framework** ===
167 167  
168 -==== **Tridimensional Diagnostic Axes** ====
46 +==== **Axes of Diagnosis** ====
169 169  
170 -**Axis 1: Etiology (Pathogenic Mechanisms)**
48 +The framework organizes diagnostic data into three axes:
171 171  
172 -* Classification based on **genetic markers, cellular pathways, and environmental risk factors**.
173 -* **AI-assisted annotation** provides **causal interpretations** for clinical use.
50 +1. **Etiology**: Genetic and environmental risk factors.
51 +1. **Molecular Markers**: Biomarkers such as amyloid-beta, tau, and alpha-synuclein.
52 +1. **Neuroanatomical Correlations**: Results from neuroimaging (e.g., MRI, PET).
174 174  
175 -**Axis 2: Molecular Markers & Biomarkers**
54 +==== **Recommendation System** ====
176 176  
177 -* **Integration of CSF, blood, and neuroimaging biomarkers**.
178 -* **Structured annotation** highlights **biological pathways linked to diagnosis**.
56 +* Suggests additional tests or biomarkers if gaps are detected in the data.
57 +* Prioritizes tests based on clinical impact and cost-effectiveness.
179 179  
180 -**Axis 3: Neuroanatomoclinical Correlations**
181 -
182 -* **MRI and EEG data** provide anatomical and functional insights.
183 -* **AI-generated progression maps** annotate **brain structure-function relationships**.
184 -
185 185  ----
186 186  
187 -== **4. Computational Workflow & Annotation Pipelines** ==
61 +=== **4. Computational Workflow** ===
188 188  
189 -==== **Data Processing Steps** ====
63 +1. **Data Loading**: Import data from storage (Drive or Bucket).
64 +1. **Feature Engineering**: Generate derived features from the raw data.
65 +1. **Model Training**:
66 +1*. Split data into training, validation, and test sets.
67 +1*. Train models with cross-validation to ensure robustness.
68 +1. **Evaluation**:
69 +1*. Metrics: Accuracy, F1-Score, AUIC for interpretability.
70 +1*. Compare against baseline models and domain benchmarks.
190 190  
191 -**Data Ingestion:**
192 -
193 -* **Harmonized datasets** stored in **EBRAINS Bucket**.
194 -* **Preprocessing pipelines** clean and standardize data.
195 -
196 -**Feature Engineering:**
197 -
198 -* **AI models** extract **clinically relevant patterns** from **EEG, MRI, and biomarkers**.
199 -
200 -**AI-Generated Annotations:**
201 -
202 -* **Automated tagging** of diagnostic features in **structured reports**.
203 -* **Explainability modules (SHAP, LIME)** ensure transparency in predictions.
204 -
205 -**Clinical Decision Support Integration:**
206 -
207 -* **AI-annotated findings** fed into **interactive dashboards**.
208 -* **Clinicians can adjust, validate, and modify annotations**.
209 -
210 210  ----
211 211  
212 -== **5. Validation & Real-World Testing** ==
74 +=== **5. Validation** ===
213 213  
214 -==== **Prospective Clinical Study** ====
76 +==== **Internal Validation** ====
215 215  
216 -* **Multi-center validation** of AI-based **annotations & risk stratifications**.
217 -* **Benchmarking against clinician-based diagnoses**.
218 -* **Real-world testing** of AI-powered **structured reporting**.
78 +* Test the system using simulated datasets and known clinical cases.
79 +* Fine-tune models based on validation results.
219 219  
220 -==== **Quality Assurance & Explainability** ====
81 +==== **External Validation** ====
221 221  
222 -* **Annotations linked to structured knowledge graphs** for improved transparency.
223 -* **Interactive annotation editor** allows clinicians to validate AI outputs.
83 +* Collaborate with research institutions and hospitals to test the system in real-world settings.
84 +* Use anonymized patient data to ensure privacy compliance.
224 224  
225 225  ----
226 226  
227 -== **6. Collaborative Development** ==
88 +=== **6. Collaborative Development** ===
228 228  
229 -The project is **open to contributions** from **researchers, clinicians, and developers**.
90 +The project is open to contributions from researchers, clinicians, and developers. Key tools include:
230 230  
231 -**Key tools include:**
232 -
233 233  * **Jupyter Notebooks**: For data analysis and pipeline development.
234 -** Example: **probabilistic imputation**
93 +** Example: [[probabilistic imputation>>https://drive.ebrains.eu/f/4f69ab52f7734ef48217/]]
235 235  * **Wiki Pages**: For documenting methods and results.
236 236  * **Drive and Bucket**: For sharing code, data, and outputs.
237 -* **Collaboration with related projects**:
238 -** Example: **Beyond the hype: AI in dementia – from early risk detection to disease treatment**
96 +* **Related projects: **For instance: [[//Beyond the hype: AI in dementia – from early risk detection to disease treatment//>>https://www.lethe-project.eu/beyond-the-hype-ai-in-dementia-from-early-risk-detection-to-disease-treatment/]]
239 239  
240 240  ----
241 241  
242 -== **7. Tools and Technologies** ==
100 +=== **7. Tools and Technologies** ===
243 243  
244 -==== **Programming Languages:** ====
245 -
246 -* **Python** for AI and data processing.
247 -
248 -==== **Frameworks:** ====
249 -
250 -* **TensorFlow** and **PyTorch** for machine learning.
251 -* **Flask** or **FastAPI** for backend services.
252 -
253 -==== **Visualization:** ====
254 -
255 -* **Plotly** and **Matplotlib** for interactive and static visualizations.
256 -
257 -==== **EBRAINS Services:** ====
258 -
259 -* **Collaboratory Lab** for running Notebooks.
260 -* **Buckets** for storing large datasets.
261 -
262 -----
263 -
264 -=== **Why This Matters** ===
265 -
266 -* The annotation system ensures that AI-generated insights are structured, interpretable, and clinically meaningful.
267 -* It enables real-time tracking of disease progression across the three diagnostic axes.
268 -* It facilitates integration with electronic health records and decision-support tools, improving AI adoption in clinical workflows.
102 +* **Programming Languages**: Python for AI and data processing.
103 +* **Frameworks**:
104 +** TensorFlow and PyTorch for machine learning.
105 +** Flask or FastAPI for backend services.
106 +* **Visualization**: Plotly and Matplotlib for interactive and static visualizations.
107 +* **EBRAINS Services**:
108 +** Collaboratory Lab for running Notebooks.
109 +** Buckets for storing large datasets.