Warning:  Due to planned infrastructure maintenance, the EBRAINS Wiki and EBRAINS Support system will be unavailable for up to three days starting Monday, 14 July. During this period, both services will be inaccessible, and any emails sent to the support address will not be received.

Attention: We are currently experiencing some issues with the EBRAINS Drive. Please bear with us as we fix this issue. We apologise for any inconvenience caused.


Changes for page Methodology

Last modified by manuelmenendez on 2025/03/14 08:31

From version 12.2
edited by manuelmenendez
on 2025/02/09 09:54
Change comment: There is no comment for this version
To version 5.1
edited by manuelmenendez
on 2025/01/29 19:11
Change comment: There is no comment for this version

Summary

Details

Page properties
Content
... ... @@ -1,273 +1,109 @@
1 -==== **Overview** ====
1 +=== **Overview** ===
2 2  
3 -This project develops a **tridimensional diagnostic framework** for **CNS diseases**, incorporating **AI-powered annotation tools** to improve **interpretability, standardization, and clinical utility**. The methodology integrates **multi-modal data**, including **genetic, neuroimaging, neurophysiological, and biomarker datasets**, and applies **machine learning models** to generate **structured, explainable diagnostic outputs**.
3 +This section describes the step-by-step process used in the **Neurodiagnoses** project to develop a novel diagnostic framework for neurological diseases. The methodology integrates artificial intelligence (AI), biomedical ontologies, and computational neuroscience to create a structured, interpretable, and scalable diagnostic system.
4 4  
5 -=== **Workflow** ===
6 -
7 -1. (((
8 -**We Use GitHub to [[Store and develop AI models, scripts, and annotation pipelines.>>https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/discussions]]**
9 -
10 -* Create a **GitHub repository** for AI scripts and models.
11 -* Use **GitHub Projects** to manage research milestones.
12 -)))
13 -1. (((
14 -**We Use EBRAINS for Data & Collaboration**
15 -
16 -* Store **biomarker and neuroimaging data** in **EBRAINS Buckets**.
17 -* Run **Jupyter Notebooks** in **EBRAINS Lab** to test AI models.
18 -* Use **EBRAINS Wiki** for structured documentation and research discussion.
19 -)))
20 -
21 21  ----
22 22  
23 23  === **1. Data Integration** ===
24 24  
25 -== Overview ==
26 -
27 -
28 -Neurodiagnoses integrates clinical data via the **EBRAINS Medical Informatics Platform (MIP)**. MIP federates decentralized clinical data, allowing Neurodiagnoses to securely access and process sensitive information for AI-based diagnostics.
29 -
30 -== How It Works ==
31 -
32 -
33 -1. (((
34 -**Authentication & API Access:**
35 -
36 -* Users must have an **EBRAINS account**.
37 -* Neurodiagnoses uses **secure API endpoints** to fetch clinical data (e.g., from the **Federation for Dementia**).
38 -)))
39 -1. (((
40 -**Data Mapping & Harmonization:**
41 -
42 -* Retrieved data is **normalized** and converted to standard formats (.csv, .json).
43 -* Data from **multiple sources** is harmonized to ensure consistency for AI processing.
44 -)))
45 -1. (((
46 -**Security & Compliance:**
47 -
48 -* All data access is **logged and monitored**.
49 -* Data remains on **MIP servers** using **federated learning techniques** when possible.
50 -* Access is granted only after signing a **Data Usage Agreement (DUA)**.
51 -)))
52 -
53 -== Implementation Steps ==
54 -
55 -
56 -1. Clone the repository.
57 -1. Configure your **EBRAINS API credentials** in mip_integration.py.
58 -1. Run the script to **download and harmonize clinical data**.
59 -1. Process the data for **AI model training**.
60 -
61 -For more detailed instructions, please refer to the **[[MIP Documentation>>url:https://mip.ebrains.eu/]]**.
62 -
63 -----
64 -
65 -= Data Processing & Integration with Clinica.Run =
66 -
67 -
68 -== Overview ==
69 -
70 -
71 -Neurodiagnoses now supports **Clinica.Run**, an open-source neuroimaging platform designed for **multimodal data processing and reproducible neuroscience workflows**.
72 -
73 -== How It Works ==
74 -
75 -
76 -1. (((
77 -**Neuroimaging Preprocessing:**
78 -
79 -* MRI, PET, EEG data is preprocessed using **Clinica.Run pipelines**.
80 -* Supports **longitudinal and cross-sectional analyses**.
81 -)))
82 -1. (((
83 -**Automated Biomarker Extraction:**
84 -
85 -* Standardized extraction of **volumetric, metabolic, and functional biomarkers**.
86 -* Integration with machine learning models in Neurodiagnoses.
87 -)))
88 -1. (((
89 -**Data Security & Compliance:**
90 -
91 -* Clinica.Run operates in **compliance with GDPR and HIPAA**.
92 -* Neuroimaging data remains **within the original storage environment**.
93 -)))
94 -
95 -== Implementation Steps ==
96 -
97 -
98 -1. Install **Clinica.Run** dependencies.
99 -1. Configure your **Clinica.Run pipeline** in clinica_run_config.json.
100 -1. Run the pipeline for **preprocessing and biomarker extraction**.
101 -1. Use processed neuroimaging data for **AI-driven diagnostics** in Neurodiagnoses.
102 -
103 -For further information, refer to **[[Clinica.Run Documentation>>url:https://clinica.run/]]**.
104 -
105 -==== ====
106 -
107 107  ==== **Data Sources** ====
108 108  
109 -[[List of potential sources of databases>>https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/data/sources/list_of_potential_databases]]
11 +* **Biomedical Ontologies**:
12 +** Human Phenotype Ontology (HPO) for phenotypic abnormalities.
13 +** Gene Ontology (GO) for molecular and cellular processes.
14 +* **Neuroimaging Datasets**:
15 +** Example: Alzheimer’s Disease Neuroimaging Initiative (ADNI), OpenNeuro.
16 +* **Clinical and Biomarker Data**:
17 +** Anonymized clinical reports, molecular biomarkers, and test results.
110 110  
111 -**Biomedical Ontologies & Databases:**
112 112  
113 -* **Human Phenotype Ontology (HPO)** for symptom annotation.
114 -* **Gene Ontology (GO)** for molecular and cellular processes.
20 +==== **Data Preprocessing** ====
115 115  
116 -**Dimensionality Reduction and Interpretability:**
22 +1. **Standardization**: Ensure all data sources are normalized to a common format.
23 +1. **Feature Selection**: Identify relevant features for diagnosis (e.g., biomarkers, imaging scores).
24 +1. **Data Cleaning**: Handle missing values and remove duplicates.
117 117  
118 -* **Evaluate interpretability** using metrics like the **Area Under the Interpretability Curve (AUIC)**.
119 -* **Leverage [[DEIBO>>https://github.com/Mellandd/DEIBO]] (Data-driven Embedding Interpretation Based on Ontologies)** to connect model dimensions to ontology concepts.
120 -
121 -**Neuroimaging & EEG/MEG Data:**
122 -
123 -* **MRI volumetric measures** for brain atrophy tracking.
124 -* **EEG functional connectivity patterns** (AI-Mind).
125 -
126 -**Clinical & Biomarker Data:**
127 -
128 -* **CSF biomarkers** (Amyloid-beta, Tau, Neurofilament Light).
129 -* **Sleep monitoring and actigraphy data** (ADIS).
130 -
131 -**Federated Learning Integration:**
132 -
133 -* **Secure multi-center data harmonization** (PROMINENT).
134 -
135 135  ----
136 136  
137 -==== **Annotation System for Multi-Modal Data** ====
138 -
139 -To ensure **structured integration of diverse datasets**, **Neurodiagnoses** will implement an **AI-driven annotation system**, which will:
140 -
141 -* **Assign standardized metadata tags** to diagnostic features.
142 -* **Provide contextual explanations** for AI-based classifications.
143 -* **Track temporal disease progression annotations** to identify long-term trends.
144 -
145 -----
146 -
147 147  === **2. AI-Based Analysis** ===
148 148  
149 -==== **Machine Learning & Deep Learning Models** ====
30 +==== **Model Development** ====
150 150  
151 -**Risk Prediction Models:**
32 +* **Embedding Models**: Use pre-trained models like BioBERT or BioLORD for text data.
33 +* **Classification Models**:
34 +** Algorithms: Random Forest, Support Vector Machines (SVM), or neural networks.
35 +** Purpose: Predict the likelihood of specific neurological conditions based on input data.
152 152  
153 -* **LETHE’s cognitive risk prediction model** integrated into the annotation framework.
37 +==== **Dimensionality Reduction and Interpretability** ====
154 154  
155 -**Biomarker Classification & Probabilistic Imputation:**
39 +* Leverage [[DEIBO>>https://drive.ebrains.eu/f/8d7157708cde4b258db0/]] (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts.
40 +* Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC).
156 156  
157 -* **KNN Imputer** and **Bayesian models** used for handling **missing biomarker data**.
158 -
159 -**Neuroimaging Feature Extraction:**
160 -
161 -* **MRI & EEG data** annotated with **neuroanatomical feature labels**.
162 -
163 -==== **AI-Powered Annotation System** ====
164 -
165 -* Uses **SHAP-based interpretability tools** to explain model decisions.
166 -* Generates **automated clinical annotations** in structured reports.
167 -* Links findings to **standardized medical ontologies** (e.g., **SNOMED, HPO**).
168 -
169 169  ----
170 170  
171 -=== **3. Diagnostic Framework & Clinical Decision Support** ===
44 +=== **3. Diagnostic Framework** ===
172 172  
173 -==== **Tridimensional Diagnostic Axes** ====
46 +==== **Axes of Diagnosis** ====
174 174  
175 -**Axis 1: Etiology (Pathogenic Mechanisms)**
48 +The framework organizes diagnostic data into three axes:
176 176  
177 -* Classification based on **genetic markers, cellular pathways, and environmental risk factors**.
178 -* **AI-assisted annotation** provides **causal interpretations** for clinical use.
50 +1. **Etiology**: Genetic and environmental risk factors.
51 +1. **Molecular Markers**: Biomarkers such as amyloid-beta, tau, and alpha-synuclein.
52 +1. **Neuroanatomical Correlations**: Results from neuroimaging (e.g., MRI, PET).
179 179  
180 -**Axis 2: Molecular Markers & Biomarkers**
54 +==== **Recommendation System** ====
181 181  
182 -* **Integration of CSF, blood, and neuroimaging biomarkers**.
183 -* **Structured annotation** highlights **biological pathways linked to diagnosis**.
56 +* Suggests additional tests or biomarkers if gaps are detected in the data.
57 +* Prioritizes tests based on clinical impact and cost-effectiveness.
184 184  
185 -**Axis 3: Neuroanatomoclinical Correlations**
186 -
187 -* **MRI and EEG data** provide anatomical and functional insights.
188 -* **AI-generated progression maps** annotate **brain structure-function relationships**.
189 -
190 190  ----
191 191  
192 -=== **4. Computational Workflow & Annotation Pipelines** ===
61 +=== **4. Computational Workflow** ===
193 193  
194 -==== **Data Processing Steps** ====
63 +1. **Data Loading**: Import data from storage (Drive or Bucket).
64 +1. **Feature Engineering**: Generate derived features from the raw data.
65 +1. **Model Training**:
66 +1*. Split data into training, validation, and test sets.
67 +1*. Train models with cross-validation to ensure robustness.
68 +1. **Evaluation**:
69 +1*. Metrics: Accuracy, F1-Score, AUIC for interpretability.
70 +1*. Compare against baseline models and domain benchmarks.
195 195  
196 -**Data Ingestion:**
197 -
198 -* **Harmonized datasets** stored in **EBRAINS Bucket**.
199 -* **Preprocessing pipelines** clean and standardize data.
200 -
201 -**Feature Engineering:**
202 -
203 -* **AI models** extract **clinically relevant patterns** from **EEG, MRI, and biomarkers**.
204 -
205 -**AI-Generated Annotations:**
206 -
207 -* **Automated tagging** of diagnostic features in **structured reports**.
208 -* **Explainability modules (SHAP, LIME)** ensure transparency in predictions.
209 -
210 -**Clinical Decision Support Integration:**
211 -
212 -* **AI-annotated findings** fed into **interactive dashboards**.
213 -* **Clinicians can adjust, validate, and modify annotations**.
214 -
215 215  ----
216 216  
217 -=== **5. Validation & Real-World Testing** ===
74 +=== **5. Validation** ===
218 218  
219 -==== **Prospective Clinical Study** ====
76 +==== **Internal Validation** ====
220 220  
221 -* **Multi-center validation** of AI-based **annotations & risk stratifications**.
222 -* **Benchmarking against clinician-based diagnoses**.
223 -* **Real-world testing** of AI-powered **structured reporting**.
78 +* Test the system using simulated datasets and known clinical cases.
79 +* Fine-tune models based on validation results.
224 224  
225 -==== **Quality Assurance & Explainability** ====
81 +==== **External Validation** ====
226 226  
227 -* **Annotations linked to structured knowledge graphs** for improved transparency.
228 -* **Interactive annotation editor** allows clinicians to validate AI outputs.
83 +* Collaborate with research institutions and hospitals to test the system in real-world settings.
84 +* Use anonymized patient data to ensure privacy compliance.
229 229  
230 230  ----
231 231  
232 232  === **6. Collaborative Development** ===
233 233  
234 -The project is **open to contributions** from **researchers, clinicians, and developers**.
90 +The project is open to contributions from researchers, clinicians, and developers. Key tools include:
235 235  
236 -**Key tools include:**
237 -
238 238  * **Jupyter Notebooks**: For data analysis and pipeline development.
239 -** Example: **probabilistic imputation**
93 +** Example: [[probabilistic imputation>>https://drive.ebrains.eu/f/4f69ab52f7734ef48217/]]
240 240  * **Wiki Pages**: For documenting methods and results.
241 241  * **Drive and Bucket**: For sharing code, data, and outputs.
242 -* **Collaboration with related projects**:
243 -** Example: **Beyond the hype: AI in dementia – from early risk detection to disease treatment**
96 +* **Collaboration with related projects: **For instance: [[//Beyond the hype: AI in dementia – from early risk detection to disease treatment//>>https://www.lethe-project.eu/beyond-the-hype-ai-in-dementia-from-early-risk-detection-to-disease-treatment/]]
244 244  
245 245  ----
246 246  
247 247  === **7. Tools and Technologies** ===
248 248  
249 -==== **Programming Languages:** ====
250 -
251 -* **Python** for AI and data processing.
252 -
253 -==== **Frameworks:** ====
254 -
255 -* **TensorFlow** and **PyTorch** for machine learning.
256 -* **Flask** or **FastAPI** for backend services.
257 -
258 -==== **Visualization:** ====
259 -
260 -* **Plotly** and **Matplotlib** for interactive and static visualizations.
261 -
262 -==== **EBRAINS Services:** ====
263 -
264 -* **Collaboratory Lab** for running Notebooks.
265 -* **Buckets** for storing large datasets.
266 -
267 -----
268 -
269 -=== **Why This Matters** ===
270 -
271 -* The annotation system ensures that AI-generated insights are structured, interpretable, and clinically meaningful.
272 -* It enables real-time tracking of disease progression across the three diagnostic axes.
273 -* It facilitates integration with electronic health records and decision-support tools, improving AI adoption in clinical workflows.
102 +* **Programming Languages**: Python for AI and data processing.
103 +* **Frameworks**:
104 +** TensorFlow and PyTorch for machine learning.
105 +** Flask or FastAPI for backend services.
106 +* **Visualization**: Plotly and Matplotlib for interactive and static visualizations.
107 +* **EBRAINS Services**:
108 +** Collaboratory Lab for running Notebooks.
109 +** Buckets for storing large datasets.