Wiki source code of Methodology
Version 5.1 by manuelmenendez on 2025/01/29 19:11
Show last authors
author | version | line-number | content |
---|---|---|---|
1 | === **Overview** === | ||
2 | |||
3 | This section describes the step-by-step process used in the **Neurodiagnoses** project to develop a novel diagnostic framework for neurological diseases. The methodology integrates artificial intelligence (AI), biomedical ontologies, and computational neuroscience to create a structured, interpretable, and scalable diagnostic system. | ||
4 | |||
5 | ---- | ||
6 | |||
7 | === **1. Data Integration** === | ||
8 | |||
9 | ==== **Data Sources** ==== | ||
10 | |||
11 | * **Biomedical Ontologies**: | ||
12 | ** Human Phenotype Ontology (HPO) for phenotypic abnormalities. | ||
13 | ** Gene Ontology (GO) for molecular and cellular processes. | ||
14 | * **Neuroimaging Datasets**: | ||
15 | ** Example: Alzheimer’s Disease Neuroimaging Initiative (ADNI), OpenNeuro. | ||
16 | * **Clinical and Biomarker Data**: | ||
17 | ** Anonymized clinical reports, molecular biomarkers, and test results. | ||
18 | |||
19 | |||
20 | ==== **Data Preprocessing** ==== | ||
21 | |||
22 | 1. **Standardization**: Ensure all data sources are normalized to a common format. | ||
23 | 1. **Feature Selection**: Identify relevant features for diagnosis (e.g., biomarkers, imaging scores). | ||
24 | 1. **Data Cleaning**: Handle missing values and remove duplicates. | ||
25 | |||
26 | ---- | ||
27 | |||
28 | === **2. AI-Based Analysis** === | ||
29 | |||
30 | ==== **Model Development** ==== | ||
31 | |||
32 | * **Embedding Models**: Use pre-trained models like BioBERT or BioLORD for text data. | ||
33 | * **Classification Models**: | ||
34 | ** Algorithms: Random Forest, Support Vector Machines (SVM), or neural networks. | ||
35 | ** Purpose: Predict the likelihood of specific neurological conditions based on input data. | ||
36 | |||
37 | ==== **Dimensionality Reduction and Interpretability** ==== | ||
38 | |||
39 | * Leverage [[DEIBO>>https://drive.ebrains.eu/f/8d7157708cde4b258db0/]] (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts. | ||
40 | * Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC). | ||
41 | |||
42 | ---- | ||
43 | |||
44 | === **3. Diagnostic Framework** === | ||
45 | |||
46 | ==== **Axes of Diagnosis** ==== | ||
47 | |||
48 | The framework organizes diagnostic data into three axes: | ||
49 | |||
50 | 1. **Etiology**: Genetic and environmental risk factors. | ||
51 | 1. **Molecular Markers**: Biomarkers such as amyloid-beta, tau, and alpha-synuclein. | ||
52 | 1. **Neuroanatomical Correlations**: Results from neuroimaging (e.g., MRI, PET). | ||
53 | |||
54 | ==== **Recommendation System** ==== | ||
55 | |||
56 | * Suggests additional tests or biomarkers if gaps are detected in the data. | ||
57 | * Prioritizes tests based on clinical impact and cost-effectiveness. | ||
58 | |||
59 | ---- | ||
60 | |||
61 | === **4. Computational Workflow** === | ||
62 | |||
63 | 1. **Data Loading**: Import data from storage (Drive or Bucket). | ||
64 | 1. **Feature Engineering**: Generate derived features from the raw data. | ||
65 | 1. **Model Training**: | ||
66 | 1*. Split data into training, validation, and test sets. | ||
67 | 1*. Train models with cross-validation to ensure robustness. | ||
68 | 1. **Evaluation**: | ||
69 | 1*. Metrics: Accuracy, F1-Score, AUIC for interpretability. | ||
70 | 1*. Compare against baseline models and domain benchmarks. | ||
71 | |||
72 | ---- | ||
73 | |||
74 | === **5. Validation** === | ||
75 | |||
76 | ==== **Internal Validation** ==== | ||
77 | |||
78 | * Test the system using simulated datasets and known clinical cases. | ||
79 | * Fine-tune models based on validation results. | ||
80 | |||
81 | ==== **External Validation** ==== | ||
82 | |||
83 | * Collaborate with research institutions and hospitals to test the system in real-world settings. | ||
84 | * Use anonymized patient data to ensure privacy compliance. | ||
85 | |||
86 | ---- | ||
87 | |||
88 | === **6. Collaborative Development** === | ||
89 | |||
90 | The project is open to contributions from researchers, clinicians, and developers. Key tools include: | ||
91 | |||
92 | * **Jupyter Notebooks**: For data analysis and pipeline development. | ||
93 | ** Example: [[probabilistic imputation>>https://drive.ebrains.eu/f/4f69ab52f7734ef48217/]] | ||
94 | * **Wiki Pages**: For documenting methods and results. | ||
95 | * **Drive and Bucket**: For sharing code, data, and outputs. | ||
96 | * **Collaboration with related projects: **For instance: [[//Beyond the hype: AI in dementia – from early risk detection to disease treatment//>>https://www.lethe-project.eu/beyond-the-hype-ai-in-dementia-from-early-risk-detection-to-disease-treatment/]] | ||
97 | |||
98 | ---- | ||
99 | |||
100 | === **7. Tools and Technologies** === | ||
101 | |||
102 | * **Programming Languages**: Python for AI and data processing. | ||
103 | * **Frameworks**: | ||
104 | ** TensorFlow and PyTorch for machine learning. | ||
105 | ** Flask or FastAPI for backend services. | ||
106 | * **Visualization**: Plotly and Matplotlib for interactive and static visualizations. | ||
107 | * **EBRAINS Services**: | ||
108 | ** Collaboratory Lab for running Notebooks. | ||
109 | ** Buckets for storing large datasets. |