Attention: The EBRAINS IDM/IAM will be down tomorrow, Wednesday 17nd December, from 17:00 CET for up to 30 minutes for maintenance. Please be aware that this will affect all services that require login or authentication.


Wiki source code of Methodology

Version 4.1 by manuelmenendez on 2025/01/27 23:46

Hide last authors
manuelmenendez 1.1 1 === **Overview** ===
2
3 This section describes the step-by-step process used in the **Neurodiagnoses** project to develop a novel diagnostic framework for neurological diseases. The methodology integrates artificial intelligence (AI), biomedical ontologies, and computational neuroscience to create a structured, interpretable, and scalable diagnostic system.
4
5 ----
6
7 === **1. Data Integration** ===
8
9 ==== **Data Sources** ====
10
11 * **Biomedical Ontologies**:
12 ** Human Phenotype Ontology (HPO) for phenotypic abnormalities.
13 ** Gene Ontology (GO) for molecular and cellular processes.
14 * **Neuroimaging Datasets**:
15 ** Example: Alzheimer’s Disease Neuroimaging Initiative (ADNI), OpenNeuro.
16 * **Clinical and Biomarker Data**:
17 ** Anonymized clinical reports, molecular biomarkers, and test results.
18
19 ==== **Data Preprocessing** ====
20
21 1. **Standardization**: Ensure all data sources are normalized to a common format.
22 1. **Feature Selection**: Identify relevant features for diagnosis (e.g., biomarkers, imaging scores).
23 1. **Data Cleaning**: Handle missing values and remove duplicates.
24
25 ----
26
27 === **2. AI-Based Analysis** ===
28
29 ==== **Model Development** ====
30
31 * **Embedding Models**: Use pre-trained models like BioBERT or BioLORD for text data.
32 * **Classification Models**:
33 ** Algorithms: Random Forest, Support Vector Machines (SVM), or neural networks.
34 ** Purpose: Predict the likelihood of specific neurological conditions based on input data.
35
36 ==== **Dimensionality Reduction and Interpretability** ====
37
manuelmenendez 3.1 38 * Leverage [[DEIBO>>https://drive.ebrains.eu/f/8d7157708cde4b258db0/]] (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts.
manuelmenendez 1.1 39 * Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC).
40
41 ----
42
43 === **3. Diagnostic Framework** ===
44
45 ==== **Axes of Diagnosis** ====
46
47 The framework organizes diagnostic data into three axes:
48
49 1. **Etiology**: Genetic and environmental risk factors.
50 1. **Molecular Markers**: Biomarkers such as amyloid-beta, tau, and alpha-synuclein.
51 1. **Neuroanatomical Correlations**: Results from neuroimaging (e.g., MRI, PET).
52
53 ==== **Recommendation System** ====
54
55 * Suggests additional tests or biomarkers if gaps are detected in the data.
56 * Prioritizes tests based on clinical impact and cost-effectiveness.
57
58 ----
59
60 === **4. Computational Workflow** ===
61
62 1. **Data Loading**: Import data from storage (Drive or Bucket).
63 1. **Feature Engineering**: Generate derived features from the raw data.
64 1. **Model Training**:
65 1*. Split data into training, validation, and test sets.
66 1*. Train models with cross-validation to ensure robustness.
67 1. **Evaluation**:
68 1*. Metrics: Accuracy, F1-Score, AUIC for interpretability.
69 1*. Compare against baseline models and domain benchmarks.
70
71 ----
72
73 === **5. Validation** ===
74
75 ==== **Internal Validation** ====
76
77 * Test the system using simulated datasets and known clinical cases.
78 * Fine-tune models based on validation results.
79
80 ==== **External Validation** ====
81
82 * Collaborate with research institutions and hospitals to test the system in real-world settings.
83 * Use anonymized patient data to ensure privacy compliance.
84
85 ----
86
87 === **6. Collaborative Development** ===
88
89 The project is open to contributions from researchers, clinicians, and developers. Key tools include:
90
91 * **Jupyter Notebooks**: For data analysis and pipeline development.
manuelmenendez 4.1 92 ** Example: [[probabilistic imputation>>https://drive.ebrains.eu/f/4f69ab52f7734ef48217/]]
manuelmenendez 1.1 93 * **Wiki Pages**: For documenting methods and results.
94 * **Drive and Bucket**: For sharing code, data, and outputs.
95
96 ----
97
98 === **7. Tools and Technologies** ===
99
100 * **Programming Languages**: Python for AI and data processing.
101 * **Frameworks**:
102 ** TensorFlow and PyTorch for machine learning.
103 ** Flask or FastAPI for backend services.
104 * **Visualization**: Plotly and Matplotlib for interactive and static visualizations.
105 * **EBRAINS Services**:
106 ** Collaboratory Lab for running Notebooks.
107 ** Buckets for storing large datasets.