Warning:  Due to planned infrastructure maintenance, the EBRAINS Wiki and EBRAINS Support system will be unavailable for up to three days starting Monday, 14 July. During this period, both services will be inaccessible, and any emails sent to the support address will not be received.

Attention: We are currently experiencing some issues with the EBRAINS Drive. Please bear with us as we fix this issue. We apologise for any inconvenience caused.


Changes for page Methodology

Last modified by manuelmenendez on 2025/03/14 08:31

From version 4.1
edited by manuelmenendez
on 2025/01/27 23:46
Change comment: There is no comment for this version
To version 18.1
edited by manuelmenendez
on 2025/02/13 12:52
Change comment: There is no comment for this version

Summary

Details

Page properties
Content
... ... @@ -1,107 +1,133 @@
1 -=== **Overview** ===
1 +== **Overview** ==
2 2  
3 -This section describes the step-by-step process used in the **Neurodiagnoses** project to develop a novel diagnostic framework for neurological diseases. The methodology integrates artificial intelligence (AI), biomedical ontologies, and computational neuroscience to create a structured, interpretable, and scalable diagnostic system.
3 +Neurodiagnoses develops a tridimensional diagnostic framework for CNS diseases, incorporating AI-powered annotation tools to improve interpretability, standardization, and clinical utility. The methodology integrates multi-modal data, including genetic, neuroimaging, neurophysiological, and biomarker datasets, and applies machine learning models to generate structured, explainable diagnostic outputs.
4 4  
5 5  ----
6 6  
7 -=== **1. Data Integration** ===
7 +== **How to Use External Databases in Neurodiagnoses** ==
8 8  
9 -==== **Data Sources** ====
9 +To enhance the accuracy of our diagnostic models, Neurodiagnoses integrates data from multiple biomedical and neurological research databases. If you are a researcher, follow these steps to access, prepare, and integrate data into the Neurodiagnoses framework.
10 10  
11 -* **Biomedical Ontologies**:
12 -** Human Phenotype Ontology (HPO) for phenotypic abnormalities.
13 -** Gene Ontology (GO) for molecular and cellular processes.
14 -* **Neuroimaging Datasets**:
15 -** Example: Alzheimer’s Disease Neuroimaging Initiative (ADNI), OpenNeuro.
16 -* **Clinical and Biomarker Data**:
17 -** Anonymized clinical reports, molecular biomarkers, and test results.
11 +=== **Potential Data Sources** ===
18 18  
19 -==== **Data Preprocessing** ====
13 +Neurodiagnoses maintains an updated list of potential biomedical databases relevant to neurodegenerative diseases.
20 20  
21 -1. **Standardization**: Ensure all data sources are normalized to a common format.
22 -1. **Feature Selection**: Identify relevant features for diagnosis (e.g., biomarkers, imaging scores).
23 -1. **Data Cleaning**: Handle missing values and remove duplicates.
15 +* Reference: [[List of Potential Databases>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/data/sources/list_of_potential_databases]]
24 24  
25 -----
17 +=== **1. Register for Access** ===
26 26  
27 -=== **2. AI-Based Analysis** ===
19 +Each external database requires individual registration and access approval. Follow the official guidelines of each database provider.
28 28  
29 -==== **Model Development** ====
21 +* Ensure that you have completed all ethical approvals and data access agreements before integrating datasets into Neurodiagnoses.
22 +* Some repositories require a Data Usage Agreement (DUA) before downloading sensitive medical data.
30 30  
31 -* **Embedding Models**: Use pre-trained models like BioBERT or BioLORD for text data.
32 -* **Classification Models**:
33 -** Algorithms: Random Forest, Support Vector Machines (SVM), or neural networks.
34 -** Purpose: Predict the likelihood of specific neurological conditions based on input data.
24 +=== **2. Download & Prepare Data** ===
35 35  
36 -==== **Dimensionality Reduction and Interpretability** ====
26 +Once access is granted, download datasets while complying with data usage policies. Ensure that the files meet Neurodiagnoses’ format requirements for smooth integration.
37 37  
38 -* Leverage [[DEIBO>>https://drive.ebrains.eu/f/8d7157708cde4b258db0/]] (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts.
39 -* Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC).
28 +==== **Supported File Formats** ====
40 40  
41 -----
30 +* Tabular Data: .csv, .tsv
31 +* Neuroimaging Data: .nii, .dcm
32 +* Genomic Data: .fasta, .vcf
33 +* Clinical Metadata: .json, .xml
42 42  
43 -=== **3. Diagnostic Framework** ===
35 +==== **Mandatory Fields for Integration** ====
44 44  
45 -==== **Axes of Diagnosis** ====
37 +|=Field Name|=Description
38 +|Subject ID|Unique patient identifier
39 +|Diagnosis|Standardized disease classification
40 +|Biomarkers|CSF, plasma, or imaging biomarkers
41 +|Genetic Data|Whole-genome or exome sequencing
42 +|Neuroimaging Metadata|MRI/PET acquisition parameters
46 46  
47 -The framework organizes diagnostic data into three axes:
44 +=== **3. Upload Data to Neurodiagnoses** ===
48 48  
49 -1. **Etiology**: Genetic and environmental risk factors.
50 -1. **Molecular Markers**: Biomarkers such as amyloid-beta, tau, and alpha-synuclein.
51 -1. **Neuroanatomical Correlations**: Results from neuroimaging (e.g., MRI, PET).
46 +Once preprocessed, data can be uploaded to EBRAINS or GitHub.
52 52  
53 -==== **Recommendation System** ====
48 +* (((
49 +**Option 1: Upload to EBRAINS Bucket**
54 54  
55 -* Suggests additional tests or biomarkers if gaps are detected in the data.
56 -* Prioritizes tests based on clinical impact and cost-effectiveness.
51 +* Location: [[EBRAINS Neurodiagnoses Bucket>>url:https://wiki.ebrains.eu/bin/view/Collabs/neurodiagnoses/Bucket]]
52 +* Ensure correct metadata tagging before submission.
53 +)))
54 +* (((
55 +**Option 2: Contribute via GitHub Repository**
57 57  
58 -----
57 +* Location: [[GitHub Data Repository>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/tree/main/data]]
58 +* Create a new folder under /data/ and include dataset description.
59 +)))
59 59  
60 -=== **4. Computational Workflow** ===
61 +//Note: For large datasets, please contact the project administrators before uploading.//
61 61  
62 -1. **Data Loading**: Import data from storage (Drive or Bucket).
63 -1. **Feature Engineering**: Generate derived features from the raw data.
64 -1. **Model Training**:
65 -1*. Split data into training, validation, and test sets.
66 -1*. Train models with cross-validation to ensure robustness.
67 -1. **Evaluation**:
68 -1*. Metrics: Accuracy, F1-Score, AUIC for interpretability.
69 -1*. Compare against baseline models and domain benchmarks.
63 +=== **4. Integrate Data into AI Models** ===
70 70  
65 +Once uploaded, datasets must be harmonized and formatted before AI model training.
66 +
67 +==== **Steps for Data Integration** ====
68 +
69 +* Open Jupyter Notebooks on EBRAINS to run preprocessing scripts.
70 +* Standardize neuroimaging and biomarker formats using harmonization tools.
71 +* Use machine learning models to handle missing data and feature extraction.
72 +* Train AI models with newly integrated patient cohorts.
73 +* Reference: [[Detailed instructions can be found in docs/data_processing.md>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/docs/data_processing.md]].
74 +
71 71  ----
72 72  
73 -=== **5. Validation** ===
77 +== **Database Sources Table** ==
74 74  
75 -==== **Internal Validation** ====
79 +=== **Where to Insert This** ===
76 76  
77 -* Test the system using simulated datasets and known clinical cases.
78 -* Fine-tune models based on validation results.
81 +* GitHub: [[docs/data_sources.md>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/docs/data_sources.md]]
82 +* EBRAINS Wiki: Collabs/neurodiagnoses/Data Sources
79 79  
80 -==== **External Validation** ====
84 +=== **Key Databases for Neurodiagnoses** ===
81 81  
82 -* Collaborate with research institutions and hospitals to test the system in real-world settings.
83 -* Use anonymized patient data to ensure privacy compliance.
86 +|=Database|=Focus Area|=Data Type|=Access Link
87 +|ADNI|Alzheimer's Disease|MRI, PET, CSF, cognitive tests|ADNI
88 +|PPMI|Parkinson’s Disease|Imaging, biospecimens|[[PPMI>>url:https://www.ppmi-info.org/]]
89 +|GP2|Genetic Data for PD|Whole-genome sequencing|[[GP2>>url:https://gp2.org/]]
90 +|Enroll-HD|Huntington’s Disease|Clinical, genetic, imaging|[[Enroll-HD>>url:https://enroll-hd.org/]]
91 +|GAAIN|Alzheimer's & Cognitive Decline|Multi-source data aggregation|[[GAAIN>>url:https://www.gaain.org/]]
92 +|UK Biobank|Population-wide studies|Genetic, imaging, health records|[[UK Biobank>>url:https://www.ukbiobank.ac.uk/]]
93 +|DPUK|Dementia & Aging|Imaging, genetics, lifestyle factors|[[DPUK>>url:https://www.dementiasplatform.uk/]]
94 +|PRION Registry|Prion Diseases|Clinical and genetic data|[[PRION Registry>>url:https://www.prionalliance.org/]]
95 +|DECIPHER|Rare Genetic Disorders|Genomic variants|DECIPHER
84 84  
97 +If you know a relevant dataset, submit a proposal in [[GitHub Issues>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/issues]].
98 +
85 85  ----
86 86  
87 -=== **6. Collaborative Development** ===
101 +== **Collaboration & Partnerships** ==
88 88  
89 -The project is open to contributions from researchers, clinicians, and developers. Key tools include:
103 +=== **Where to Insert This** ===
90 90  
91 -* **Jupyter Notebooks**: For data analysis and pipeline development.
92 -** Example: [[probabilistic imputation>>https://drive.ebrains.eu/f/4f69ab52f7734ef48217/]]
93 -* **Wiki Pages**: For documenting methods and results.
94 -* **Drive and Bucket**: For sharing code, data, and outputs.
105 +* GitHub: [[docs/collaboration.md>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/docs/collaboration.md]]
106 +* EBRAINS Wiki: Collabs/neurodiagnoses/Collaborations
95 95  
108 +=== **Partnering with Data Providers** ===
109 +
110 +Beyond using existing datasets, Neurodiagnoses seeks partnerships with data repositories to:
111 +
112 +* Enable direct API-based data integration for real-time processing.
113 +* Co-develop harmonized AI-ready datasets with standardized annotations.
114 +* Secure funding opportunities through joint grant applications.
115 +
116 +=== **Interested in Partnering?** ===
117 +
118 +If you represent a research consortium or database provider, reach out to explore data-sharing agreements.
119 +
120 +* Contact: [[info@neurodiagnoses.com>>mailto:info@neurodiagnoses.com]]
121 +
96 96  ----
97 97  
98 -=== **7. Tools and Technologies** ===
124 +== **Final Notes** ==
99 99  
100 -* **Programming Languages**: Python for AI and data processing.
101 -* **Frameworks**:
102 -** TensorFlow and PyTorch for machine learning.
103 -** Flask or FastAPI for backend services.
104 -* **Visualization**: Plotly and Matplotlib for interactive and static visualizations.
105 -* **EBRAINS Services**:
106 -** Collaboratory Lab for running Notebooks.
107 -** Buckets for storing large datasets.
126 +Neurodiagnoses continuously expands its data ecosystem to support AI-driven clinical decision-making. Researchers and institutions are encouraged to contribute new datasets and methodologies.
127 +
128 +For additional technical documentation:
129 +
130 +* [[GitHub Repository>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses]]
131 +* [[EBRAINS Collaboration Page>>url:https://wiki.ebrains.eu/bin/view/Collabs/neurodiagnoses/]]
132 +
133 +If you experience issues integrating data, open a [[GitHub Issue>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/issues]] or consult the EBRAINS Neurodiagnoses Forum.