Changes for page Methodology
Last modified by manuelmenendez on 2025/03/14 08:31
From version 19.1
edited by manuelmenendez
on 2025/02/14 13:57
on 2025/02/14 13:57
Change comment:
There is no comment for this version
To version 5.1
edited by manuelmenendez
on 2025/01/29 19:11
on 2025/01/29 19:11
Change comment:
There is no comment for this version
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -1,207 +1,109 @@ 1 -** # Neurodiagnoses AI: Multimodal AI for Neurodiagnostic Predictions**1 +=== **Overview** === 2 2 3 -## **Project Overview** 4 -Neurodiagnoses AI implements AI-driven diagnostic and prognostic models for central nervous system (CNS) disorders, adapting the Florey Dementia Index (FDI) methodology to a broader set of neurological conditions. The approach integrates **multimodal data sources** (EEG, neuroimaging, biomarkers, and genetics) and employs **machine learning models** to provide **explainable, real-time diagnostic insights**.## 3 +This section describes the step-by-step process used in the **Neurodiagnoses** project to develop a novel diagnostic framework for neurological diseases. The methodology integrates artificial intelligence (AI), biomedical ontologies, and computational neuroscience to create a structured, interpretable, and scalable diagnostic system. 5 5 6 -## **How to Use External Databases in Neurodiagnoses** 7 -To enhance diagnostic accuracy, Neurodiagnoses integrates data from multiple biomedical and neurological research databases. Researchers can follow these steps to access, prepare, and integrate data into the Neurodiagnoses framework.## 5 +---- 8 8 9 -### **Potential Data Sources** 10 -Neurodiagnoses maintains an updated list of potential biomedical databases relevant to neurodegenerative diseases. ## 7 +=== **1. Data Integration** === 11 11 12 -**Reference: List of Potential Databases** 13 -- **ADNI**: Alzheimer's Disease data ([ADNI](https://adni.loni.usc.edu)) 14 -- **PPMI**: Parkinson’s Disease Imaging and biospecimens ([PPMI](https://www.ppmi-info.org)) 15 -- **GP2**: Whole-genome sequencing for PD ([GP2](https://gp2.org)) 16 -- **Enroll-HD**: Huntington’s Disease Clinical and genetic data ([Enroll-HD](https://www.enroll-hd.org)) 17 -- **GAAIN**: Multi-source Alzheimer’s data aggregation ([GAAIN](https://gaain.org)) 18 -- **UK Biobank**: Population-wide genetic, imaging, and health records ([UK Biobank](https://www.ukbiobank.ac.uk)) 19 -- **DPUK**: Dementia and Aging data ([DPUK](https://www.dementiasplatform.uk)) 20 -- **PRION Registry**: Prion Diseases clinical and genetic data ([PRION Registry](https://prionregistry.org)) 21 -- **DECIPHER**: Rare genetic disorder genomic variants ([DECIPHER](https://decipher.sanger.ac.uk)) 9 +==== **Data Sources** ==== 22 22 23 -### **1. Register for Access** 24 -- Each external database requires **individual registration** and access approval. 25 -- Ensure compliance with **ethical approvals** and **data usage agreements** before integrating datasets into Neurodiagnoses. 26 -- Some repositories may require a **Data Usage Agreement (DUA)** for sensitive medical data.## 11 +* **Biomedical Ontologies**: 12 +** Human Phenotype Ontology (HPO) for phenotypic abnormalities. 13 +** Gene Ontology (GO) for molecular and cellular processes. 14 +* **Neuroimaging Datasets**: 15 +** Example: Alzheimer’s Disease Neuroimaging Initiative (ADNI), OpenNeuro. 16 +* **Clinical and Biomarker Data**: 17 +** Anonymized clinical reports, molecular biomarkers, and test results. 27 27 28 -### **2. Download & Prepare Data** 29 -- Download datasets while adhering to database usage policies. 30 -- Ensure files meet **Neurodiagnoses format requirements**: 31 - - **Tabular Data**: `.csv`, `.tsv` 32 - - **Neuroimaging Data**: `.nii`, `.dcm` 33 - - **Genomic Data**: `.fasta`, `.vcf` 34 - - **Clinical Metadata**: `.json`, `.xml`## 35 35 36 -- **Mandatory Fields for Integration**: 37 - - **Subject ID**: Unique patient identifier 38 - - **Diagnosis**: Standardized disease classification 39 - - **Biomarkers**: CSF, plasma, or imaging biomarkers 40 - - **Genetic Data**: Whole-genome or exome sequencing 41 - - **Neuroimaging Metadata**: MRI/PET acquisition parameters 20 +==== **Data Preprocessing** ==== 42 42 43 -### **3. Upload Data to Neurodiagnoses** 44 -**Option 1: Upload to EBRAINS Bucket** 45 -- Location: **EBRAINS Neurodiagnoses Bucket** 46 -- Ensure correct **metadata tagging** before submission.## 22 +1. **Standardization**: Ensure all data sources are normalized to a common format. 23 +1. **Feature Selection**: Identify relevant features for diagnosis (e.g., biomarkers, imaging scores). 24 +1. **Data Cleaning**: Handle missing values and remove duplicates. 47 47 48 - **Option 2: Contribute via GitHub Repository** 49 -- Location: **GitHub Data Repository** 50 -- Create a new folder under `/data/` and include a **dataset description**. 51 -- For large datasets, contact project administrators before uploading. 26 +---- 52 52 53 -### **4. Integrate Data into AI Models** 54 -- Open **Jupyter Notebooks** on EBRAINS to run **preprocessing scripts**. 55 -- Standardize **neuroimaging and biomarker formats** using harmonization tools. 56 -- Use **machine learning models** to handle missing data and feature extraction. 57 -- Train AI models with **newly integrated patient cohorts**.## 28 +=== **2. AI-Based Analysis** === 58 58 59 -** Reference**:See`docs/data_processing.md` for detailed instructions.30 +==== **Model Development** ==== 60 60 61 -## **Collaboration & Partnerships**## 62 -# **Partnering with Data Providers** 63 -Neurodiagnoses seeks partnerships with data repositories to: 64 -- Enable **API-based data integration** for real-time processing. 65 -- Co-develop **harmonized AI-ready datasets** with standardized annotations. 66 -- Secure **funding opportunities** through joint grant applications. 32 +* **Embedding Models**: Use pre-trained models like BioBERT or BioLORD for text data. 33 +* **Classification Models**: 34 +** Algorithms: Random Forest, Support Vector Machines (SVM), or neural networks. 35 +** Purpose: Predict the likelihood of specific neurological conditions based on input data. 67 67 68 -**Interested in Partnering?** 69 -- If you represent a research consortium or database provider, reach out to explore data-sharing agreements. 70 -- **Contact**: info@neurodiagnoses.com 37 +==== **Dimensionality Reduction and Interpretability** ==== 71 71 72 - ##**FinalNotes**73 - Neurodiagnoses continuouslyexpandsits dataecosystem to supportAI-driven clinicaldecision-making.Researchersandinstitutionsareencouraged tocontribute**new datasetsand methodologies**.##39 +* Leverage [[DEIBO>>https://drive.ebrains.eu/f/8d7157708cde4b258db0/]] (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts. 40 +* Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC). 74 74 75 -For additional technical documentation: 76 -- **GitHub Repository**: [Neurodiagnoses GitHub](https://github.com/neurodiagnoses) 77 -- **EBRAINS Collaboration Page**: [EBRAINS Neurodiagnoses](https://ebrains.eu/collabs/neurodiagnoses) 42 +---- 78 78 79 - Ifyouexperience issues integratingdata, **opena GitHub Issue**or consultthe **EBRAINS Neurodiagnoses Forum**.44 +=== **3. Diagnostic Framework** === 80 80 81 -== ** How to Use External Databasesin Neurodiagnoses** ==46 +==== **Axes of Diagnosis** ==== 82 82 83 -T o enhancethe accuracyofourdiagnostic models,Neurodiagnoses integrates data from multiple biomedicalandneurological research databases.If you are a researcher, follow these steps to access, prepare, andintegrate data into the Neurodiagnosesframework.48 +The framework organizes diagnostic data into three axes: 84 84 85 -=== **Potential Data Sources** === 50 +1. **Etiology**: Genetic and environmental risk factors. 51 +1. **Molecular Markers**: Biomarkers such as amyloid-beta, tau, and alpha-synuclein. 52 +1. **Neuroanatomical Correlations**: Results from neuroimaging (e.g., MRI, PET). 86 86 87 - Neurodiagnosesmaintains an updated listof potentialbiomedical databases relevantto neurodegenerativediseases.54 +==== **Recommendation System** ==== 88 88 89 -* Reference: [[List of Potential Databases>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/data/sources/list_of_potential_databases]] 56 +* Suggests additional tests or biomarkers if gaps are detected in the data. 57 +* Prioritizes tests based on clinical impact and cost-effectiveness. 90 90 91 - === **1. Register for Access** ===59 +---- 92 92 93 - Eachexternaldatabase requires individual registrationnd access approval.Followthe official guidelines of each database provider.61 +=== **4. Computational Workflow** === 94 94 95 -* Ensure that you have completed all ethical approvals and data access agreements before integrating datasets into Neurodiagnoses. 96 -* Some repositories require a Data Usage Agreement (DUA) before downloading sensitive medical data. 63 +1. **Data Loading**: Import data from storage (Drive or Bucket). 64 +1. **Feature Engineering**: Generate derived features from the raw data. 65 +1. **Model Training**: 66 +1*. Split data into training, validation, and test sets. 67 +1*. Train models with cross-validation to ensure robustness. 68 +1. **Evaluation**: 69 +1*. Metrics: Accuracy, F1-Score, AUIC for interpretability. 70 +1*. Compare against baseline models and domain benchmarks. 97 97 98 -=== **2. Download & Prepare Data** === 99 - 100 -Once access is granted, download datasets while complying with data usage policies. Ensure that the files meet Neurodiagnoses’ format requirements for smooth integration. 101 - 102 -==== **Supported File Formats** ==== 103 - 104 -* Tabular Data: .csv, .tsv 105 -* Neuroimaging Data: .nii, .dcm 106 -* Genomic Data: .fasta, .vcf 107 -* Clinical Metadata: .json, .xml 108 - 109 -==== **Mandatory Fields for Integration** ==== 110 - 111 -|=Field Name|=Description 112 -|Subject ID|Unique patient identifier 113 -|Diagnosis|Standardized disease classification 114 -|Biomarkers|CSF, plasma, or imaging biomarkers 115 -|Genetic Data|Whole-genome or exome sequencing 116 -|Neuroimaging Metadata|MRI/PET acquisition parameters 117 - 118 -=== **3. Upload Data to Neurodiagnoses** === 119 - 120 -Once preprocessed, data can be uploaded to EBRAINS or GitHub. 121 - 122 -* ((( 123 -**Option 1: Upload to EBRAINS Bucket** 124 - 125 -* Location: [[EBRAINS Neurodiagnoses Bucket>>url:https://wiki.ebrains.eu/bin/view/Collabs/neurodiagnoses/Bucket]] 126 -* Ensure correct metadata tagging before submission. 127 -))) 128 -* ((( 129 -**Option 2: Contribute via GitHub Repository** 130 - 131 -* Location: [[GitHub Data Repository>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/tree/main/data]] 132 -* Create a new folder under /data/ and include dataset description. 133 -))) 134 - 135 -//Note: For large datasets, please contact the project administrators before uploading.// 136 - 137 -=== **4. Integrate Data into AI Models** === 138 - 139 -Once uploaded, datasets must be harmonized and formatted before AI model training. 140 - 141 -==== **Steps for Data Integration** ==== 142 - 143 -* Open Jupyter Notebooks on EBRAINS to run preprocessing scripts. 144 -* Standardize neuroimaging and biomarker formats using harmonization tools. 145 -* Use machine learning models to handle missing data and feature extraction. 146 -* Train AI models with newly integrated patient cohorts. 147 -* Reference: [[Detailed instructions can be found in docs/data_processing.md>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/docs/data_processing.md]]. 148 - 149 149 ---- 150 150 151 -== ** DatabaseSources Table** ==74 +=== **5. Validation** === 152 152 153 -=== ** Where toInsertThis** ===76 +==== **Internal Validation** ==== 154 154 155 -* GitHub:[[docs/data_sources.md>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/docs/data_sources.md]]156 -* EBRAINS Wiki: Collabs/neurodiagnoses/DataSources78 +* Test the system using simulated datasets and known clinical cases. 79 +* Fine-tune models based on validation results. 157 157 158 -=== ** Key Databases forNeurodiagnoses** ===81 +==== **External Validation** ==== 159 159 160 -|=Database|=Focus Area|=Data Type|=Access Link 161 -|ADNI|Alzheimer's Disease|MRI, PET, CSF, cognitive tests|ADNI 162 -|PPMI|Parkinson’s Disease|Imaging, biospecimens|[[PPMI>>url:https://www.ppmi-info.org/]] 163 -|GP2|Genetic Data for PD|Whole-genome sequencing|[[GP2>>url:https://gp2.org/]] 164 -|Enroll-HD|Huntington’s Disease|Clinical, genetic, imaging|[[Enroll-HD>>url:https://enroll-hd.org/]] 165 -|GAAIN|Alzheimer's & Cognitive Decline|Multi-source data aggregation|[[GAAIN>>url:https://www.gaain.org/]] 166 -|UK Biobank|Population-wide studies|Genetic, imaging, health records|[[UK Biobank>>url:https://www.ukbiobank.ac.uk/]] 167 -|DPUK|Dementia & Aging|Imaging, genetics, lifestyle factors|[[DPUK>>url:https://www.dementiasplatform.uk/]] 168 -|PRION Registry|Prion Diseases|Clinical and genetic data|[[PRION Registry>>url:https://www.prionalliance.org/]] 169 -|DECIPHER|Rare Genetic Disorders|Genomic variants|DECIPHER 83 +* Collaborate with research institutions and hospitals to test the system in real-world settings. 84 +* Use anonymized patient data to ensure privacy compliance. 170 170 171 -If you know a relevant dataset, submit a proposal in [[GitHub Issues>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/issues]]. 172 - 173 173 ---- 174 174 175 -== **Collaborati on& Partnerships** ==88 +=== **6. Collaborative Development** === 176 176 177 - === **Where toInsertThis**===90 +The project is open to contributions from researchers, clinicians, and developers. Key tools include: 178 178 179 -* GitHub: [[docs/collaboration.md>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/docs/collaboration.md]] 180 -* EBRAINS Wiki: Collabs/neurodiagnoses/Collaborations 92 +* **Jupyter Notebooks**: For data analysis and pipeline development. 93 +** Example: [[probabilistic imputation>>https://drive.ebrains.eu/f/4f69ab52f7734ef48217/]] 94 +* **Wiki Pages**: For documenting methods and results. 95 +* **Drive and Bucket**: For sharing code, data, and outputs. 96 +* **Collaboration with related projects: **For instance: [[//Beyond the hype: AI in dementia – from early risk detection to disease treatment//>>https://www.lethe-project.eu/beyond-the-hype-ai-in-dementia-from-early-risk-detection-to-disease-treatment/]] 181 181 182 -=== **Partnering with Data Providers** === 183 - 184 -Beyond using existing datasets, Neurodiagnoses seeks partnerships with data repositories to: 185 - 186 -* Enable direct API-based data integration for real-time processing. 187 -* Co-develop harmonized AI-ready datasets with standardized annotations. 188 -* Secure funding opportunities through joint grant applications. 189 - 190 -=== **Interested in Partnering?** === 191 - 192 -If you represent a research consortium or database provider, reach out to explore data-sharing agreements. 193 - 194 -* Contact: [[info@neurodiagnoses.com>>mailto:info@neurodiagnoses.com]] 195 - 196 196 ---- 197 197 198 -== ** FinalNotes** ==100 +=== **7. Tools and Technologies** === 199 199 200 - Neurodiagnoses continuouslyexpands its dataecosystem tosupportAI-drivenclinical decision-making. Researchers andinstitutionsareencouraged toontributenew datasetsand methodologies.201 - 202 - For additionaltechnicaldocumentation:203 - 204 -* [[GitHub Repository>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses]]205 -* [[EBRAINSCollaboration Page>>url:https://wiki.ebrains.eu/bin/view/Collabs/neurodiagnoses/]]206 - 207 - Ifyouexperienceissuesintegratingdata, open a [[GitHub Issue>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/issues]]or consult the EBRAINS NeurodiagnosesForum.102 +* **Programming Languages**: Python for AI and data processing. 103 +* **Frameworks**: 104 +** TensorFlow and PyTorch for machine learning. 105 +** Flask or FastAPI for backend services. 106 +* **Visualization**: Plotly and Matplotlib for interactive and static visualizations. 107 +* **EBRAINS Services**: 108 +** Collaboratory Lab for running Notebooks. 109 +** Buckets for storing large datasets.