Wiki source code of Methodology
Version 3.1 by manuelmenendez on 2025/01/27 23:28
Show last authors
| author | version | line-number | content |
|---|---|---|---|
| 1 | === **Overview** === | ||
| 2 | |||
| 3 | This section describes the step-by-step process used in the **Neurodiagnoses** project to develop a novel diagnostic framework for neurological diseases. The methodology integrates artificial intelligence (AI), biomedical ontologies, and computational neuroscience to create a structured, interpretable, and scalable diagnostic system. | ||
| 4 | |||
| 5 | ---- | ||
| 6 | |||
| 7 | === **1. Data Integration** === | ||
| 8 | |||
| 9 | ==== **Data Sources** ==== | ||
| 10 | |||
| 11 | * **Biomedical Ontologies**: | ||
| 12 | ** Human Phenotype Ontology (HPO) for phenotypic abnormalities. | ||
| 13 | ** Gene Ontology (GO) for molecular and cellular processes. | ||
| 14 | * **Neuroimaging Datasets**: | ||
| 15 | ** Example: Alzheimer’s Disease Neuroimaging Initiative (ADNI), OpenNeuro. | ||
| 16 | * **Clinical and Biomarker Data**: | ||
| 17 | ** Anonymized clinical reports, molecular biomarkers, and test results. | ||
| 18 | |||
| 19 | ==== **Data Preprocessing** ==== | ||
| 20 | |||
| 21 | 1. **Standardization**: Ensure all data sources are normalized to a common format. | ||
| 22 | 1. **Feature Selection**: Identify relevant features for diagnosis (e.g., biomarkers, imaging scores). | ||
| 23 | 1. **Data Cleaning**: Handle missing values and remove duplicates. | ||
| 24 | |||
| 25 | ---- | ||
| 26 | |||
| 27 | === **2. AI-Based Analysis** === | ||
| 28 | |||
| 29 | ==== **Model Development** ==== | ||
| 30 | |||
| 31 | * **Embedding Models**: Use pre-trained models like BioBERT or BioLORD for text data. | ||
| 32 | * **Classification Models**: | ||
| 33 | ** Algorithms: Random Forest, Support Vector Machines (SVM), or neural networks. | ||
| 34 | ** Purpose: Predict the likelihood of specific neurological conditions based on input data. | ||
| 35 | |||
| 36 | ==== **Dimensionality Reduction and Interpretability** ==== | ||
| 37 | |||
| 38 | * Leverage [[DEIBO>>https://drive.ebrains.eu/f/8d7157708cde4b258db0/]] (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts. | ||
| 39 | * Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC). | ||
| 40 | |||
| 41 | ---- | ||
| 42 | |||
| 43 | === **3. Diagnostic Framework** === | ||
| 44 | |||
| 45 | ==== **Axes of Diagnosis** ==== | ||
| 46 | |||
| 47 | The framework organizes diagnostic data into three axes: | ||
| 48 | |||
| 49 | 1. **Etiology**: Genetic and environmental risk factors. | ||
| 50 | 1. **Molecular Markers**: Biomarkers such as amyloid-beta, tau, and alpha-synuclein. | ||
| 51 | 1. **Neuroanatomical Correlations**: Results from neuroimaging (e.g., MRI, PET). | ||
| 52 | |||
| 53 | ==== **Recommendation System** ==== | ||
| 54 | |||
| 55 | * Suggests additional tests or biomarkers if gaps are detected in the data. | ||
| 56 | * Prioritizes tests based on clinical impact and cost-effectiveness. | ||
| 57 | |||
| 58 | ---- | ||
| 59 | |||
| 60 | === **4. Computational Workflow** === | ||
| 61 | |||
| 62 | 1. **Data Loading**: Import data from storage (Drive or Bucket). | ||
| 63 | 1. **Feature Engineering**: Generate derived features from the raw data. | ||
| 64 | 1. **Model Training**: | ||
| 65 | 1*. Split data into training, validation, and test sets. | ||
| 66 | 1*. Train models with cross-validation to ensure robustness. | ||
| 67 | 1. **Evaluation**: | ||
| 68 | 1*. Metrics: Accuracy, F1-Score, AUIC for interpretability. | ||
| 69 | 1*. Compare against baseline models and domain benchmarks. | ||
| 70 | |||
| 71 | ---- | ||
| 72 | |||
| 73 | === **5. Validation** === | ||
| 74 | |||
| 75 | ==== **Internal Validation** ==== | ||
| 76 | |||
| 77 | * Test the system using simulated datasets and known clinical cases. | ||
| 78 | * Fine-tune models based on validation results. | ||
| 79 | |||
| 80 | ==== **External Validation** ==== | ||
| 81 | |||
| 82 | * Collaborate with research institutions and hospitals to test the system in real-world settings. | ||
| 83 | * Use anonymized patient data to ensure privacy compliance. | ||
| 84 | |||
| 85 | ---- | ||
| 86 | |||
| 87 | === **6. Collaborative Development** === | ||
| 88 | |||
| 89 | The project is open to contributions from researchers, clinicians, and developers. Key tools include: | ||
| 90 | |||
| 91 | * **Jupyter Notebooks**: For data analysis and pipeline development. | ||
| 92 | * **Wiki Pages**: For documenting methods and results. | ||
| 93 | * **Drive and Bucket**: For sharing code, data, and outputs. | ||
| 94 | |||
| 95 | ---- | ||
| 96 | |||
| 97 | === **7. Tools and Technologies** === | ||
| 98 | |||
| 99 | * **Programming Languages**: Python for AI and data processing. | ||
| 100 | * **Frameworks**: | ||
| 101 | ** TensorFlow and PyTorch for machine learning. | ||
| 102 | ** Flask or FastAPI for backend services. | ||
| 103 | * **Visualization**: Plotly and Matplotlib for interactive and static visualizations. | ||
| 104 | * **EBRAINS Services**: | ||
| 105 | ** Collaboratory Lab for running Notebooks. | ||
| 106 | ** Buckets for storing large datasets. |