Methodology - HBP Wiki

=== **Overview** ===

This section describes the step-by-step process used in the **Neurodiagnoses** project to develop a novel diagnostic framework for neurological diseases. The methodology integrates artificial intelligence (AI), biomedical ontologies, and computational neuroscience to create a structured, interpretable, and scalable diagnostic system.

author	version	line-number	content
		1	=== Overview ===
		2
		3	This section describes the step-by-step process used in the Neurodiagnoses project to develop a novel diagnostic framework for neurological diseases. The methodology integrates artificial intelligence (AI), biomedical ontologies, and computational neuroscience to create a structured, interpretable, and scalable diagnostic system.
		4
		5	----
		6
		7	=== 1. Data Integration ===
		8
		9	==== Data Sources ====
		10
		11	* Biomedical Ontologies:
		12	** Human Phenotype Ontology (HPO) for phenotypic abnormalities.
		13	** Gene Ontology (GO) for molecular and cellular processes.
		14	* Neuroimaging Datasets:
		15	** Example: Alzheimer’s Disease Neuroimaging Initiative (ADNI), OpenNeuro.
		16	* Clinical and Biomarker Data:
		17	** Anonymized clinical reports, molecular biomarkers, and test results.
		18
		19	==== Data Preprocessing ====
		20
		21	1. Standardization: Ensure all data sources are normalized to a common format.
		22	1. Feature Selection: Identify relevant features for diagnosis (e.g., biomarkers, imaging scores).
		23	1. Data Cleaning: Handle missing values and remove duplicates.
		24
		25	----
		26
		27	=== 2. AI-Based Analysis ===
		28
		29	==== Model Development ====
		30
		31	* Embedding Models: Use pre-trained models like BioBERT or BioLORD for text data.
		32	* Classification Models:
		33	** Algorithms: Random Forest, Support Vector Machines (SVM), or neural networks.
		34	** Purpose: Predict the likelihood of specific neurological conditions based on input data.
		35
		36	==== Dimensionality Reduction and Interpretability ====
		37
		38	* Leverage DEIBO (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts.
		39	* Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC).
		40
		41	----
		42
		43	=== 3. Diagnostic Framework ===
		44
		45	==== Axes of Diagnosis ====
		46
		47	The framework organizes diagnostic data into three axes:
		48
		49	1. Etiology: Genetic and environmental risk factors.
		50	1. Molecular Markers: Biomarkers such as amyloid-beta, tau, and alpha-synuclein.
		51	1. Neuroanatomical Correlations: Results from neuroimaging (e.g., MRI, PET).
		52
		53	==== Recommendation System ====
		54
		55	* Suggests additional tests or biomarkers if gaps are detected in the data.
		56	* Prioritizes tests based on clinical impact and cost-effectiveness.
		57
		58	----
		59
		60	=== 4. Computational Workflow ===
		61
		62	1. Data Loading: Import data from storage (Drive or Bucket).
		63	1. Feature Engineering: Generate derived features from the raw data.
		64	1. Model Training:
		65	1*. Split data into training, validation, and test sets.
		66	1*. Train models with cross-validation to ensure robustness.
		67	1. Evaluation:
		68	1*. Metrics: Accuracy, F1-Score, AUIC for interpretability.
		69	1*. Compare against baseline models and domain benchmarks.
		70
		71	----
		72
		73	=== 5. Validation ===
		74
		75	==== Internal Validation ====
		76
		77	* Test the system using simulated datasets and known clinical cases.
		78	* Fine-tune models based on validation results.
		79
		80	==== External Validation ====
		81
		82	* Collaborate with research institutions and hospitals to test the system in real-world settings.
		83	* Use anonymized patient data to ensure privacy compliance.
		84
		85	----
		86
		87	=== 6. Collaborative Development ===
		88
		89	The project is open to contributions from researchers, clinicians, and developers. Key tools include:
		90
		91	* Jupyter Notebooks: For data analysis and pipeline development.
		92	* Wiki Pages: For documenting methods and results.
		93	* Drive and Bucket: For sharing code, data, and outputs.
		94
		95	----
		96
		97	=== 7. Tools and Technologies ===
		98
		99	* Programming Languages: Python for AI and data processing.
		100	* Frameworks:
		101	** TensorFlow and PyTorch for machine learning.
		102	** Flask or FastAPI for backend services.
		103	* Visualization: Plotly and Matplotlib for interactive and static visualizations.
		104	* EBRAINS Services:
		105	** Collaboratory Lab for running Notebooks.
		106	** Buckets for storing large datasets.

Wiki source code of Methodology

Neurodiagnoses