Wiki source code of Methodology

Version 1.1 by manuelmenendez on 2025/01/27 23:07

Hide last authors
manuelmenendez 1.1 1 === **Overview** ===
2
3 This section describes the step-by-step process used in the **Neurodiagnoses** project to develop a novel diagnostic framework for neurological diseases. The methodology integrates artificial intelligence (AI), biomedical ontologies, and computational neuroscience to create a structured, interpretable, and scalable diagnostic system.
4
5 ----
6
7 === **1. Data Integration** ===
8
9 ==== **Data Sources** ====
10
11 * **Biomedical Ontologies**:
12 ** Human Phenotype Ontology (HPO) for phenotypic abnormalities.
13 ** Gene Ontology (GO) for molecular and cellular processes.
14 * **Neuroimaging Datasets**:
15 ** Example: Alzheimer’s Disease Neuroimaging Initiative (ADNI), OpenNeuro.
16 * **Clinical and Biomarker Data**:
17 ** Anonymized clinical reports, molecular biomarkers, and test results.
18
19 ==== **Data Preprocessing** ====
20
21 1. **Standardization**: Ensure all data sources are normalized to a common format.
22 1. **Feature Selection**: Identify relevant features for diagnosis (e.g., biomarkers, imaging scores).
23 1. **Data Cleaning**: Handle missing values and remove duplicates.
24
25 ----
26
27 === **2. AI-Based Analysis** ===
28
29 ==== **Model Development** ====
30
31 * **Embedding Models**: Use pre-trained models like BioBERT or BioLORD for text data.
32 * **Classification Models**:
33 ** Algorithms: Random Forest, Support Vector Machines (SVM), or neural networks.
34 ** Purpose: Predict the likelihood of specific neurological conditions based on input data.
35
36 ==== **Dimensionality Reduction and Interpretability** ====
37
38 * Leverage DEIBO (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts.
39 * Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC).
40
41 ----
42
43 === **3. Diagnostic Framework** ===
44
45 ==== **Axes of Diagnosis** ====
46
47 The framework organizes diagnostic data into three axes:
48
49 1. **Etiology**: Genetic and environmental risk factors.
50 1. **Molecular Markers**: Biomarkers such as amyloid-beta, tau, and alpha-synuclein.
51 1. **Neuroanatomical Correlations**: Results from neuroimaging (e.g., MRI, PET).
52
53 ==== **Recommendation System** ====
54
55 * Suggests additional tests or biomarkers if gaps are detected in the data.
56 * Prioritizes tests based on clinical impact and cost-effectiveness.
57
58 ----
59
60 === **4. Computational Workflow** ===
61
62 1. **Data Loading**: Import data from storage (Drive or Bucket).
63 1. **Feature Engineering**: Generate derived features from the raw data.
64 1. **Model Training**:
65 1*. Split data into training, validation, and test sets.
66 1*. Train models with cross-validation to ensure robustness.
67 1. **Evaluation**:
68 1*. Metrics: Accuracy, F1-Score, AUIC for interpretability.
69 1*. Compare against baseline models and domain benchmarks.
70
71 ----
72
73 === **5. Validation** ===
74
75 ==== **Internal Validation** ====
76
77 * Test the system using simulated datasets and known clinical cases.
78 * Fine-tune models based on validation results.
79
80 ==== **External Validation** ====
81
82 * Collaborate with research institutions and hospitals to test the system in real-world settings.
83 * Use anonymized patient data to ensure privacy compliance.
84
85 ----
86
87 === **6. Collaborative Development** ===
88
89 The project is open to contributions from researchers, clinicians, and developers. Key tools include:
90
91 * **Jupyter Notebooks**: For data analysis and pipeline development.
92 * **Wiki Pages**: For documenting methods and results.
93 * **Drive and Bucket**: For sharing code, data, and outputs.
94
95 ----
96
97 === **7. Tools and Technologies** ===
98
99 * **Programming Languages**: Python for AI and data processing.
100 * **Frameworks**:
101 ** TensorFlow and PyTorch for machine learning.
102 ** Flask or FastAPI for backend services.
103 * **Visualization**: Plotly and Matplotlib for interactive and static visualizations.
104 * **EBRAINS Services**:
105 ** Collaboratory Lab for running Notebooks.
106 ** Buckets for storing large datasets.