Wiki source code of Methodology
Version 1.1 by manuelmenendez on 2025/01/27 23:07
Show last authors
author | version | line-number | content |
---|---|---|---|
1 | === **Overview** === | ||
2 | |||
3 | This section describes the step-by-step process used in the **Neurodiagnoses** project to develop a novel diagnostic framework for neurological diseases. The methodology integrates artificial intelligence (AI), biomedical ontologies, and computational neuroscience to create a structured, interpretable, and scalable diagnostic system. | ||
4 | |||
5 | ---- | ||
6 | |||
7 | === **1. Data Integration** === | ||
8 | |||
9 | ==== **Data Sources** ==== | ||
10 | |||
11 | * **Biomedical Ontologies**: | ||
12 | ** Human Phenotype Ontology (HPO) for phenotypic abnormalities. | ||
13 | ** Gene Ontology (GO) for molecular and cellular processes. | ||
14 | * **Neuroimaging Datasets**: | ||
15 | ** Example: Alzheimer’s Disease Neuroimaging Initiative (ADNI), OpenNeuro. | ||
16 | * **Clinical and Biomarker Data**: | ||
17 | ** Anonymized clinical reports, molecular biomarkers, and test results. | ||
18 | |||
19 | ==== **Data Preprocessing** ==== | ||
20 | |||
21 | 1. **Standardization**: Ensure all data sources are normalized to a common format. | ||
22 | 1. **Feature Selection**: Identify relevant features for diagnosis (e.g., biomarkers, imaging scores). | ||
23 | 1. **Data Cleaning**: Handle missing values and remove duplicates. | ||
24 | |||
25 | ---- | ||
26 | |||
27 | === **2. AI-Based Analysis** === | ||
28 | |||
29 | ==== **Model Development** ==== | ||
30 | |||
31 | * **Embedding Models**: Use pre-trained models like BioBERT or BioLORD for text data. | ||
32 | * **Classification Models**: | ||
33 | ** Algorithms: Random Forest, Support Vector Machines (SVM), or neural networks. | ||
34 | ** Purpose: Predict the likelihood of specific neurological conditions based on input data. | ||
35 | |||
36 | ==== **Dimensionality Reduction and Interpretability** ==== | ||
37 | |||
38 | * Leverage DEIBO (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts. | ||
39 | * Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC). | ||
40 | |||
41 | ---- | ||
42 | |||
43 | === **3. Diagnostic Framework** === | ||
44 | |||
45 | ==== **Axes of Diagnosis** ==== | ||
46 | |||
47 | The framework organizes diagnostic data into three axes: | ||
48 | |||
49 | 1. **Etiology**: Genetic and environmental risk factors. | ||
50 | 1. **Molecular Markers**: Biomarkers such as amyloid-beta, tau, and alpha-synuclein. | ||
51 | 1. **Neuroanatomical Correlations**: Results from neuroimaging (e.g., MRI, PET). | ||
52 | |||
53 | ==== **Recommendation System** ==== | ||
54 | |||
55 | * Suggests additional tests or biomarkers if gaps are detected in the data. | ||
56 | * Prioritizes tests based on clinical impact and cost-effectiveness. | ||
57 | |||
58 | ---- | ||
59 | |||
60 | === **4. Computational Workflow** === | ||
61 | |||
62 | 1. **Data Loading**: Import data from storage (Drive or Bucket). | ||
63 | 1. **Feature Engineering**: Generate derived features from the raw data. | ||
64 | 1. **Model Training**: | ||
65 | 1*. Split data into training, validation, and test sets. | ||
66 | 1*. Train models with cross-validation to ensure robustness. | ||
67 | 1. **Evaluation**: | ||
68 | 1*. Metrics: Accuracy, F1-Score, AUIC for interpretability. | ||
69 | 1*. Compare against baseline models and domain benchmarks. | ||
70 | |||
71 | ---- | ||
72 | |||
73 | === **5. Validation** === | ||
74 | |||
75 | ==== **Internal Validation** ==== | ||
76 | |||
77 | * Test the system using simulated datasets and known clinical cases. | ||
78 | * Fine-tune models based on validation results. | ||
79 | |||
80 | ==== **External Validation** ==== | ||
81 | |||
82 | * Collaborate with research institutions and hospitals to test the system in real-world settings. | ||
83 | * Use anonymized patient data to ensure privacy compliance. | ||
84 | |||
85 | ---- | ||
86 | |||
87 | === **6. Collaborative Development** === | ||
88 | |||
89 | The project is open to contributions from researchers, clinicians, and developers. Key tools include: | ||
90 | |||
91 | * **Jupyter Notebooks**: For data analysis and pipeline development. | ||
92 | * **Wiki Pages**: For documenting methods and results. | ||
93 | * **Drive and Bucket**: For sharing code, data, and outputs. | ||
94 | |||
95 | ---- | ||
96 | |||
97 | === **7. Tools and Technologies** === | ||
98 | |||
99 | * **Programming Languages**: Python for AI and data processing. | ||
100 | * **Frameworks**: | ||
101 | ** TensorFlow and PyTorch for machine learning. | ||
102 | ** Flask or FastAPI for backend services. | ||
103 | * **Visualization**: Plotly and Matplotlib for interactive and static visualizations. | ||
104 | * **EBRAINS Services**: | ||
105 | ** Collaboratory Lab for running Notebooks. | ||
106 | ** Buckets for storing large datasets. |