Methodology
                  Version 4.2 by manuelmenendez on 2025/01/29 19:10
              
      Overview
This section describes the step-by-step process used in the Neurodiagnoses project to develop a novel diagnostic framework for neurological diseases. The methodology integrates artificial intelligence (AI), biomedical ontologies, and computational neuroscience to create a structured, interpretable, and scalable diagnostic system.
1. Data Integration
Data Sources
- Biomedical Ontologies:
- Human Phenotype Ontology (HPO) for phenotypic abnormalities.
 - Gene Ontology (GO) for molecular and cellular processes.
 
 - Neuroimaging Datasets:
- Example: Alzheimer’s Disease Neuroimaging Initiative (ADNI), OpenNeuro.
 
 - Clinical and Biomarker Data:
- Anonymized clinical reports, molecular biomarkers, and test results.
 
 
Data Preprocessing
- Standardization: Ensure all data sources are normalized to a common format.
 - Feature Selection: Identify relevant features for diagnosis (e.g., biomarkers, imaging scores).
 - Data Cleaning: Handle missing values and remove duplicates.
 
2. AI-Based Analysis
Model Development
- Embedding Models: Use pre-trained models like BioBERT or BioLORD for text data.
 - Classification Models:
- Algorithms: Random Forest, Support Vector Machines (SVM), or neural networks.
 - Purpose: Predict the likelihood of specific neurological conditions based on input data.
 
 
Dimensionality Reduction and Interpretability
- Leverage DEIBO (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts.
 - Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC).
 
3. Diagnostic Framework
Axes of Diagnosis
The framework organizes diagnostic data into three axes:
- Etiology: Genetic and environmental risk factors.
 - Molecular Markers: Biomarkers such as amyloid-beta, tau, and alpha-synuclein.
 - Neuroanatomical Correlations: Results from neuroimaging (e.g., MRI, PET).
 
Recommendation System
- Suggests additional tests or biomarkers if gaps are detected in the data.
 - Prioritizes tests based on clinical impact and cost-effectiveness.
 
4. Computational Workflow
- Data Loading: Import data from storage (Drive or Bucket).
 - Feature Engineering: Generate derived features from the raw data.
 - Model Training:
- Split data into training, validation, and test sets.
 - Train models with cross-validation to ensure robustness.
 
 - Evaluation:
- Metrics: Accuracy, F1-Score, AUIC for interpretability.
 - Compare against baseline models and domain benchmarks.
 
 
5. Validation
Internal Validation
- Test the system using simulated datasets and known clinical cases.
 - Fine-tune models based on validation results.
 
External Validation
- Collaborate with research institutions and hospitals to test the system in real-world settings.
 - Use anonymized patient data to ensure privacy compliance.
 
6. Collaborative Development
The project is open to contributions from researchers, clinicians, and developers. Key tools include:
- Jupyter Notebooks: For data analysis and pipeline development.
- Example: probabilistic imputation
 
 - Wiki Pages: For documenting methods and results.
 - Drive and Bucket: For sharing code, data, and outputs.
 - Related projects: For instance: Beyond the hype: AI in dementia – from early risk detection to disease treatment
 
7. Tools and Technologies
- Programming Languages: Python for AI and data processing.
 - Frameworks:
- TensorFlow and PyTorch for machine learning.
 - Flask or FastAPI for backend services.
 
 - Visualization: Plotly and Matplotlib for interactive and static visualizations.
 - EBRAINS Services:
- Collaboratory Lab for running Notebooks.
 - Buckets for storing large datasets.