Wiki source code of Methodology
Version 4.2 by manuelmenendez on 2025/01/29 19:10
Show last authors
| author | version | line-number | content |
|---|---|---|---|
| 1 | === **Overview** === | ||
| 2 | |||
| 3 | This section describes the step-by-step process used in the **Neurodiagnoses** project to develop a novel diagnostic framework for neurological diseases. The methodology integrates artificial intelligence (AI), biomedical ontologies, and computational neuroscience to create a structured, interpretable, and scalable diagnostic system. | ||
| 4 | |||
| 5 | ---- | ||
| 6 | |||
| 7 | === **1. Data Integration** === | ||
| 8 | |||
| 9 | ==== **Data Sources** ==== | ||
| 10 | |||
| 11 | * **Biomedical Ontologies**: | ||
| 12 | ** Human Phenotype Ontology (HPO) for phenotypic abnormalities. | ||
| 13 | ** Gene Ontology (GO) for molecular and cellular processes. | ||
| 14 | * **Neuroimaging Datasets**: | ||
| 15 | ** Example: Alzheimer’s Disease Neuroimaging Initiative (ADNI), OpenNeuro. | ||
| 16 | * **Clinical and Biomarker Data**: | ||
| 17 | ** Anonymized clinical reports, molecular biomarkers, and test results. | ||
| 18 | |||
| 19 | |||
| 20 | ==== **Data Preprocessing** ==== | ||
| 21 | |||
| 22 | 1. **Standardization**: Ensure all data sources are normalized to a common format. | ||
| 23 | 1. **Feature Selection**: Identify relevant features for diagnosis (e.g., biomarkers, imaging scores). | ||
| 24 | 1. **Data Cleaning**: Handle missing values and remove duplicates. | ||
| 25 | |||
| 26 | ---- | ||
| 27 | |||
| 28 | === **2. AI-Based Analysis** === | ||
| 29 | |||
| 30 | ==== **Model Development** ==== | ||
| 31 | |||
| 32 | * **Embedding Models**: Use pre-trained models like BioBERT or BioLORD for text data. | ||
| 33 | * **Classification Models**: | ||
| 34 | ** Algorithms: Random Forest, Support Vector Machines (SVM), or neural networks. | ||
| 35 | ** Purpose: Predict the likelihood of specific neurological conditions based on input data. | ||
| 36 | |||
| 37 | ==== **Dimensionality Reduction and Interpretability** ==== | ||
| 38 | |||
| 39 | * Leverage [[DEIBO>>https://drive.ebrains.eu/f/8d7157708cde4b258db0/]] (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts. | ||
| 40 | * Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC). | ||
| 41 | |||
| 42 | ---- | ||
| 43 | |||
| 44 | === **3. Diagnostic Framework** === | ||
| 45 | |||
| 46 | ==== **Axes of Diagnosis** ==== | ||
| 47 | |||
| 48 | The framework organizes diagnostic data into three axes: | ||
| 49 | |||
| 50 | 1. **Etiology**: Genetic and environmental risk factors. | ||
| 51 | 1. **Molecular Markers**: Biomarkers such as amyloid-beta, tau, and alpha-synuclein. | ||
| 52 | 1. **Neuroanatomical Correlations**: Results from neuroimaging (e.g., MRI, PET). | ||
| 53 | |||
| 54 | ==== **Recommendation System** ==== | ||
| 55 | |||
| 56 | * Suggests additional tests or biomarkers if gaps are detected in the data. | ||
| 57 | * Prioritizes tests based on clinical impact and cost-effectiveness. | ||
| 58 | |||
| 59 | ---- | ||
| 60 | |||
| 61 | === **4. Computational Workflow** === | ||
| 62 | |||
| 63 | 1. **Data Loading**: Import data from storage (Drive or Bucket). | ||
| 64 | 1. **Feature Engineering**: Generate derived features from the raw data. | ||
| 65 | 1. **Model Training**: | ||
| 66 | 1*. Split data into training, validation, and test sets. | ||
| 67 | 1*. Train models with cross-validation to ensure robustness. | ||
| 68 | 1. **Evaluation**: | ||
| 69 | 1*. Metrics: Accuracy, F1-Score, AUIC for interpretability. | ||
| 70 | 1*. Compare against baseline models and domain benchmarks. | ||
| 71 | |||
| 72 | ---- | ||
| 73 | |||
| 74 | === **5. Validation** === | ||
| 75 | |||
| 76 | ==== **Internal Validation** ==== | ||
| 77 | |||
| 78 | * Test the system using simulated datasets and known clinical cases. | ||
| 79 | * Fine-tune models based on validation results. | ||
| 80 | |||
| 81 | ==== **External Validation** ==== | ||
| 82 | |||
| 83 | * Collaborate with research institutions and hospitals to test the system in real-world settings. | ||
| 84 | * Use anonymized patient data to ensure privacy compliance. | ||
| 85 | |||
| 86 | ---- | ||
| 87 | |||
| 88 | === **6. Collaborative Development** === | ||
| 89 | |||
| 90 | The project is open to contributions from researchers, clinicians, and developers. Key tools include: | ||
| 91 | |||
| 92 | * **Jupyter Notebooks**: For data analysis and pipeline development. | ||
| 93 | ** Example: [[probabilistic imputation>>https://drive.ebrains.eu/f/4f69ab52f7734ef48217/]] | ||
| 94 | * **Wiki Pages**: For documenting methods and results. | ||
| 95 | * **Drive and Bucket**: For sharing code, data, and outputs. | ||
| 96 | * **Related projects: **For instance: [[//Beyond the hype: AI in dementia – from early risk detection to disease treatment//>>https://www.lethe-project.eu/beyond-the-hype-ai-in-dementia-from-early-risk-detection-to-disease-treatment/]] | ||
| 97 | |||
| 98 | ---- | ||
| 99 | |||
| 100 | === **7. Tools and Technologies** === | ||
| 101 | |||
| 102 | * **Programming Languages**: Python for AI and data processing. | ||
| 103 | * **Frameworks**: | ||
| 104 | ** TensorFlow and PyTorch for machine learning. | ||
| 105 | ** Flask or FastAPI for backend services. | ||
| 106 | * **Visualization**: Plotly and Matplotlib for interactive and static visualizations. | ||
| 107 | * **EBRAINS Services**: | ||
| 108 | ** Collaboratory Lab for running Notebooks. | ||
| 109 | ** Buckets for storing large datasets. |