Methodology - HBP Wiki

=== **Overview** ===

This section describes the step-by-step process used in the **Neurodiagnoses** project to develop a novel diagnostic framework for neurological diseases. The methodology integrates artificial intelligence (AI), biomedical ontologies, and computational neuroscience to create a structured, interpretable, and scalable diagnostic system.

4

5

----

6

7

=== **1. Data Integration** ===

8

9

==== **Data Sources** ====

10

11

* **Biomedical Ontologies**:

12

** Human Phenotype Ontology (HPO) for phenotypic abnormalities.

13

** Gene Ontology (GO) for molecular and cellular processes.

14

* **Neuroimaging Datasets**:

15

** Example: Alzheimer’s Disease Neuroimaging Initiative (ADNI), OpenNeuro.

16

* **Clinical and Biomarker Data**:

17

** Anonymized clinical reports, molecular biomarkers, and test results.

18

19

20

==== **Data Preprocessing** ====

21

22

1. **Standardization**: Ensure all data sources are normalized to a common format.

23

1. **Feature Selection**: Identify relevant features for diagnosis (e.g., biomarkers, imaging scores).

24

1. **Data Cleaning**: Handle missing values and remove duplicates.

----

=== **2. AI-Based Analysis** ===

29

30

==== **Model Development** ====

31

32

* **Embedding Models**: Use pre-trained models like BioBERT or BioLORD for text data.

33

* **Classification Models**:

34

** Algorithms: Random Forest, Support Vector Machines (SVM), or neural networks.

35

** Purpose: Predict the likelihood of specific neurological conditions based on input data.

36

37

==== **Dimensionality Reduction and Interpretability** ====

38

39

* Leverage [[DEIBO>>https://drive.ebrains.eu/f/8d7157708cde4b258db0/]] (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts.

40

* Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC).

----

=== **3. Diagnostic Framework** ===

45

46

==== **Axes of Diagnosis** ====

47

48

The framework organizes diagnostic data into three axes:

49

50

1. **Etiology**: Genetic and environmental risk factors.

51

1. **Molecular Markers**: Biomarkers such as amyloid-beta, tau, and alpha-synuclein.

52

1. **Neuroanatomical Correlations**: Results from neuroimaging (e.g., MRI, PET).

53

54

==== **Recommendation System** ====

55

56

* Suggests additional tests or biomarkers if gaps are detected in the data.

57

* Prioritizes tests based on clinical impact and cost-effectiveness.

----

=== **4. Computational Workflow** ===

62

63

1. **Data Loading**: Import data from storage (Drive or Bucket).

64

1. **Feature Engineering**: Generate derived features from the raw data.

65

1. **Model Training**:

66

1*. Split data into training, validation, and test sets.

67

1*. Train models with cross-validation to ensure robustness.

68

1. **Evaluation**:

69

1*. Metrics: Accuracy, F1-Score, AUIC for interpretability.

70

1*. Compare against baseline models and domain benchmarks.

----

=== **5. Validation** ===

75

76

==== **Internal Validation** ====

77

78

* Test the system using simulated datasets and known clinical cases.

79

* Fine-tune models based on validation results.

80

81

==== **External Validation** ====

82

83

* Collaborate with research institutions and hospitals to test the system in real-world settings.

84

* Use anonymized patient data to ensure privacy compliance.

----

=== **6. Collaborative Development** ===

89

90

The project is open to contributions from researchers, clinicians, and developers. Key tools include:

91

92

* **Jupyter Notebooks**: For data analysis and pipeline development.

93

** Example: [[probabilistic imputation>>https://drive.ebrains.eu/f/4f69ab52f7734ef48217/]]

94

* **Wiki Pages**: For documenting methods and results.

95

* **Drive and Bucket**: For sharing code, data, and outputs.

96

* **Collaboration with related projects: **For instance: [[//Beyond the hype: AI in dementia – from early risk detection to disease treatment//>>https://www.lethe-project.eu/beyond-the-hype-ai-in-dementia-from-early-risk-detection-to-disease-treatment/]]

----

=== **7. Tools and Technologies** ===

101

102

* **Programming Languages**: Python for AI and data processing.

103

* **Frameworks**:

104