Warning:  Due to planned infrastructure maintenance, the EBRAINS Wiki and EBRAINS Support system will be unavailable for up to three days starting Monday, 14 July. During this period, both services will be inaccessible, and any emails sent to the support address will not be received.

Attention: We are currently experiencing some issues with the EBRAINS Drive. Please bear with us as we fix this issue. We apologise for any inconvenience caused.


Changes for page Methodology

Last modified by manuelmenendez on 2025/03/14 08:31

From version 10.1
edited by manuelmenendez
on 2025/02/01 18:31
Change comment: There is no comment for this version
To version 4.1
edited by manuelmenendez
on 2025/01/27 23:46
Change comment: There is no comment for this version

Summary

Details

Page properties
Content
... ... @@ -1,23 +1,7 @@
1 -==== **Overview** ====
1 +=== **Overview** ===
2 2  
3 -This project develops a **tridimensional diagnostic framework** for **CNS diseases**, incorporating **AI-powered annotation tools** to improve **interpretability, standardization, and clinical utility**. The methodology integrates **multi-modal data**, including **genetic, neuroimaging, neurophysiological, and biomarker datasets**, and applies **machine learning models** to generate **structured, explainable diagnostic outputs**.
3 +This section describes the step-by-step process used in the **Neurodiagnoses** project to develop a novel diagnostic framework for neurological diseases. The methodology integrates artificial intelligence (AI), biomedical ontologies, and computational neuroscience to create a structured, interpretable, and scalable diagnostic system.
4 4  
5 -=== **Workflow** ===
6 -
7 -1. (((
8 -**We Use GitHub to [[Store and develop AI models, scripts, and annotation pipelines.>>https://github.com/users/manuelmenendezgonzalez/projects/1/views/1]]**
9 -
10 -* Create a **GitHub repository** for AI scripts and models.
11 -* Use **GitHub Projects** to manage research milestones.
12 -)))
13 -1. (((
14 -**We Use EBRAINS for Data & Collaboration**
15 -
16 -* Store **biomarker and neuroimaging data** in **EBRAINS Buckets**.
17 -* Run **Jupyter Notebooks** in **EBRAINS Lab** to test AI models.
18 -* Use **EBRAINS Wiki** for structured documentation and research discussion.
19 -)))
20 -
21 21  ----
22 22  
23 23  === **1. Data Integration** ===
... ... @@ -24,166 +24,100 @@
24 24  
25 25  ==== **Data Sources** ====
26 26  
27 -**Biomedical Ontologies & Databases:**
11 +* **Biomedical Ontologies**:
12 +** Human Phenotype Ontology (HPO) for phenotypic abnormalities.
13 +** Gene Ontology (GO) for molecular and cellular processes.
14 +* **Neuroimaging Datasets**:
15 +** Example: Alzheimer’s Disease Neuroimaging Initiative (ADNI), OpenNeuro.
16 +* **Clinical and Biomarker Data**:
17 +** Anonymized clinical reports, molecular biomarkers, and test results.
28 28  
29 -* **Human Phenotype Ontology (HPO)** for symptom annotation.
30 -* **Gene Ontology (GO)** for molecular and cellular processes.
19 +==== **Data Preprocessing** ====
31 31  
32 -**Dimensionality Reduction and Interpretability:**
21 +1. **Standardization**: Ensure all data sources are normalized to a common format.
22 +1. **Feature Selection**: Identify relevant features for diagnosis (e.g., biomarkers, imaging scores).
23 +1. **Data Cleaning**: Handle missing values and remove duplicates.
33 33  
34 -* **Evaluate interpretability** using metrics like the **Area Under the Interpretability Curve (AUIC)**.
35 -* **Leverage DEIBO (Data-driven Embedding Interpretation Based on Ontologies)** to connect model dimensions to ontology concepts.
36 -
37 -**Neuroimaging & EEG/MEG Data:**
38 -
39 -* **MRI volumetric measures** for brain atrophy tracking.
40 -* **EEG functional connectivity patterns** (AI-Mind).
41 -
42 -**Clinical & Biomarker Data:**
43 -
44 -* **CSF biomarkers** (Amyloid-beta, Tau, Neurofilament Light).
45 -* **Sleep monitoring and actigraphy data** (ADIS).
46 -
47 -**Federated Learning Integration:**
48 -
49 -* **Secure multi-center data harmonization** (PROMINENT).
50 -
51 51  ----
52 52  
53 -==== **Annotation System for Multi-Modal Data** ====
54 -
55 -To ensure **structured integration of diverse datasets**, **Neurodiagnoses** will implement an **AI-driven annotation system**, which will:
56 -
57 -* **Assign standardized metadata tags** to diagnostic features.
58 -* **Provide contextual explanations** for AI-based classifications.
59 -* **Track temporal disease progression annotations** to identify long-term trends.
60 -
61 -----
62 -
63 63  === **2. AI-Based Analysis** ===
64 64  
65 -==== **Machine Learning & Deep Learning Models** ====
29 +==== **Model Development** ====
66 66  
67 -**Risk Prediction Models:**
31 +* **Embedding Models**: Use pre-trained models like BioBERT or BioLORD for text data.
32 +* **Classification Models**:
33 +** Algorithms: Random Forest, Support Vector Machines (SVM), or neural networks.
34 +** Purpose: Predict the likelihood of specific neurological conditions based on input data.
68 68  
69 -* **LETHE’s cognitive risk prediction model** integrated into the annotation framework.
36 +==== **Dimensionality Reduction and Interpretability** ====
70 70  
71 -**Biomarker Classification & Probabilistic Imputation:**
38 +* Leverage [[DEIBO>>https://drive.ebrains.eu/f/8d7157708cde4b258db0/]] (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts.
39 +* Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC).
72 72  
73 -* **KNN Imputer** and **Bayesian models** used for handling **missing biomarker data**.
74 -
75 -**Neuroimaging Feature Extraction:**
76 -
77 -* **MRI & EEG data** annotated with **neuroanatomical feature labels**.
78 -
79 -==== **AI-Powered Annotation System** ====
80 -
81 -* Uses **SHAP-based interpretability tools** to explain model decisions.
82 -* Generates **automated clinical annotations** in structured reports.
83 -* Links findings to **standardized medical ontologies** (e.g., **SNOMED, HPO**).
84 -
85 85  ----
86 86  
87 -=== **3. Diagnostic Framework & Clinical Decision Support** ===
43 +=== **3. Diagnostic Framework** ===
88 88  
89 -==== **Tridimensional Diagnostic Axes** ====
45 +==== **Axes of Diagnosis** ====
90 90  
91 -**Axis 1: Etiology (Pathogenic Mechanisms)**
47 +The framework organizes diagnostic data into three axes:
92 92  
93 -* Classification based on **genetic markers, cellular pathways, and environmental risk factors**.
94 -* **AI-assisted annotation** provides **causal interpretations** for clinical use.
49 +1. **Etiology**: Genetic and environmental risk factors.
50 +1. **Molecular Markers**: Biomarkers such as amyloid-beta, tau, and alpha-synuclein.
51 +1. **Neuroanatomical Correlations**: Results from neuroimaging (e.g., MRI, PET).
95 95  
96 -**Axis 2: Molecular Markers & Biomarkers**
53 +==== **Recommendation System** ====
97 97  
98 -* **Integration of CSF, blood, and neuroimaging biomarkers**.
99 -* **Structured annotation** highlights **biological pathways linked to diagnosis**.
55 +* Suggests additional tests or biomarkers if gaps are detected in the data.
56 +* Prioritizes tests based on clinical impact and cost-effectiveness.
100 100  
101 -**Axis 3: Neuroanatomoclinical Correlations**
102 -
103 -* **MRI and EEG data** provide anatomical and functional insights.
104 -* **AI-generated progression maps** annotate **brain structure-function relationships**.
105 -
106 106  ----
107 107  
108 -=== **4. Computational Workflow & Annotation Pipelines** ===
60 +=== **4. Computational Workflow** ===
109 109  
110 -==== **Data Processing Steps** ====
62 +1. **Data Loading**: Import data from storage (Drive or Bucket).
63 +1. **Feature Engineering**: Generate derived features from the raw data.
64 +1. **Model Training**:
65 +1*. Split data into training, validation, and test sets.
66 +1*. Train models with cross-validation to ensure robustness.
67 +1. **Evaluation**:
68 +1*. Metrics: Accuracy, F1-Score, AUIC for interpretability.
69 +1*. Compare against baseline models and domain benchmarks.
111 111  
112 -**Data Ingestion:**
113 -
114 -* **Harmonized datasets** stored in **EBRAINS Bucket**.
115 -* **Preprocessing pipelines** clean and standardize data.
116 -
117 -**Feature Engineering:**
118 -
119 -* **AI models** extract **clinically relevant patterns** from **EEG, MRI, and biomarkers**.
120 -
121 -**AI-Generated Annotations:**
122 -
123 -* **Automated tagging** of diagnostic features in **structured reports**.
124 -* **Explainability modules (SHAP, LIME)** ensure transparency in predictions.
125 -
126 -**Clinical Decision Support Integration:**
127 -
128 -* **AI-annotated findings** fed into **interactive dashboards**.
129 -* **Clinicians can adjust, validate, and modify annotations**.
130 -
131 131  ----
132 132  
133 -=== **5. Validation & Real-World Testing** ===
73 +=== **5. Validation** ===
134 134  
135 -==== **Prospective Clinical Study** ====
75 +==== **Internal Validation** ====
136 136  
137 -* **Multi-center validation** of AI-based **annotations & risk stratifications**.
138 -* **Benchmarking against clinician-based diagnoses**.
139 -* **Real-world testing** of AI-powered **structured reporting**.
77 +* Test the system using simulated datasets and known clinical cases.
78 +* Fine-tune models based on validation results.
140 140  
141 -==== **Quality Assurance & Explainability** ====
80 +==== **External Validation** ====
142 142  
143 -* **Annotations linked to structured knowledge graphs** for improved transparency.
144 -* **Interactive annotation editor** allows clinicians to validate AI outputs.
82 +* Collaborate with research institutions and hospitals to test the system in real-world settings.
83 +* Use anonymized patient data to ensure privacy compliance.
145 145  
146 146  ----
147 147  
148 148  === **6. Collaborative Development** ===
149 149  
150 -The project is **open to contributions** from **researchers, clinicians, and developers**.
89 +The project is open to contributions from researchers, clinicians, and developers. Key tools include:
151 151  
152 -**Key tools include:**
153 -
154 154  * **Jupyter Notebooks**: For data analysis and pipeline development.
155 -** Example: **probabilistic imputation**
92 +** Example: [[probabilistic imputation>>https://drive.ebrains.eu/f/4f69ab52f7734ef48217/]]
156 156  * **Wiki Pages**: For documenting methods and results.
157 157  * **Drive and Bucket**: For sharing code, data, and outputs.
158 -* **Collaboration with related projects**:
159 -** Example: **Beyond the hype: AI in dementia – from early risk detection to disease treatment**
160 160  
161 161  ----
162 162  
163 163  === **7. Tools and Technologies** ===
164 164  
165 -==== **Programming Languages:** ====
166 -
167 -* **Python** for AI and data processing.
168 -
169 -==== **Frameworks:** ====
170 -
171 -* **TensorFlow** and **PyTorch** for machine learning.
172 -* **Flask** or **FastAPI** for backend services.
173 -
174 -==== **Visualization:** ====
175 -
176 -* **Plotly** and **Matplotlib** for interactive and static visualizations.
177 -
178 -==== **EBRAINS Services:** ====
179 -
180 -* **Collaboratory Lab** for running Notebooks.
181 -* **Buckets** for storing large datasets.
182 -
183 -----
184 -
185 -=== **Why This Matters** ===
186 -
187 -* **The annotation system ensures that AI-generated insights are structured, interpretable, and clinically meaningful.**
188 -* **It enables real-time tracking of disease progression across the three diagnostic axes.**
189 -* **It facilitates integration with electronic health records and decision-support tools, improving AI adoption in clinical workflows.**
100 +* **Programming Languages**: Python for AI and data processing.
101 +* **Frameworks**:
102 +** TensorFlow and PyTorch for machine learning.
103 +** Flask or FastAPI for backend services.
104 +* **Visualization**: Plotly and Matplotlib for interactive and static visualizations.
105 +* **EBRAINS Services**:
106 +** Collaboratory Lab for running Notebooks.
107 +** Buckets for storing large datasets.