Changes for page Methodology
Last modified by manuelmenendez on 2025/03/14 08:31
From version 18.1
edited by manuelmenendez
on 2025/02/13 12:52
on 2025/02/13 12:52
Change comment:
There is no comment for this version
To version 1.1
edited by manuelmenendez
on 2025/01/27 23:07
on 2025/01/27 23:07
Change comment:
There is no comment for this version
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -1,133 +1,106 @@ 1 -== **Overview** == 1 +=== **Overview** === 2 2 3 -Neurodiagnoses develop satridimensional diagnostic framework forCNS diseases, incorporating AI-powered annotation tools tomprove interpretability,standardization, and clinical utility. The methodology integratesmulti-modaldata,includinggenetic,neuroimaging,neurophysiological, andbiomarker datasets, andappliesmachinelearning modelstogenerate structured, explainable diagnosticoutputs.3 +This section describes the step-by-step process used in the **Neurodiagnoses** project to develop a novel diagnostic framework for neurological diseases. The methodology integrates artificial intelligence (AI), biomedical ontologies, and computational neuroscience to create a structured, interpretable, and scalable diagnostic system. 4 4 5 5 ---- 6 6 7 -== ** Howto Use ExternalDatabasesinNeurodiagnoses** ==7 +=== **1. Data Integration** === 8 8 9 - Toenhancetheaccuracyof ourdiagnosticmodels,Neurodiagnoses integrates data from multiple biomedical and neurological research databases. If you are a researcher, follow these steps to access, prepare, and integrate data into the Neurodiagnoses framework.9 +==== **Data Sources** ==== 10 10 11 -=== **Potential Data Sources** === 11 +* **Biomedical Ontologies**: 12 +** Human Phenotype Ontology (HPO) for phenotypic abnormalities. 13 +** Gene Ontology (GO) for molecular and cellular processes. 14 +* **Neuroimaging Datasets**: 15 +** Example: Alzheimer’s Disease Neuroimaging Initiative (ADNI), OpenNeuro. 16 +* **Clinical and Biomarker Data**: 17 +** Anonymized clinical reports, molecular biomarkers, and test results. 12 12 13 - Neurodiagnosesmaintainsan updated list ofpotential biomedical databasesrelevant to neurodegenerativediseases.19 +==== **Data Preprocessing** ==== 14 14 15 -* Reference: [[List of Potential Databases>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/data/sources/list_of_potential_databases]] 21 +1. **Standardization**: Ensure all data sources are normalized to a common format. 22 +1. **Feature Selection**: Identify relevant features for diagnosis (e.g., biomarkers, imaging scores). 23 +1. **Data Cleaning**: Handle missing values and remove duplicates. 16 16 17 - === **1. Register for Access** ===25 +---- 18 18 19 - Eachexternaldatabaserequires individualregistration and accessapproval. Follow the official guidelines of each database provider.27 +=== **2. AI-Based Analysis** === 20 20 21 -* Ensure that you have completed all ethical approvals and data access agreements before integrating datasets into Neurodiagnoses. 22 -* Some repositories require a Data Usage Agreement (DUA) before downloading sensitive medical data. 29 +==== **Model Development** ==== 23 23 24 -=== **2. Download & Prepare Data** === 31 +* **Embedding Models**: Use pre-trained models like BioBERT or BioLORD for text data. 32 +* **Classification Models**: 33 +** Algorithms: Random Forest, Support Vector Machines (SVM), or neural networks. 34 +** Purpose: Predict the likelihood of specific neurological conditions based on input data. 25 25 26 - Onceaccessis granted, download datasets while complying withdatausage policies. Ensurethat the files meet Neurodiagnoses’format requirementsfor smooth integration.36 +==== **Dimensionality Reduction and Interpretability** ==== 27 27 28 -==== **Supported File Formats** ==== 38 +* Leverage DEIBO (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts. 39 +* Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC). 29 29 30 -* Tabular Data: .csv, .tsv 31 -* Neuroimaging Data: .nii, .dcm 32 -* Genomic Data: .fasta, .vcf 33 -* Clinical Metadata: .json, .xml 41 +---- 34 34 35 -=== =**Mandatory FieldsforIntegration** ====43 +=== **3. Diagnostic Framework** === 36 36 37 -|=Field Name|=Description 38 -|Subject ID|Unique patient identifier 39 -|Diagnosis|Standardized disease classification 40 -|Biomarkers|CSF, plasma, or imaging biomarkers 41 -|Genetic Data|Whole-genome or exome sequencing 42 -|Neuroimaging Metadata|MRI/PET acquisition parameters 45 +==== **Axes of Diagnosis** ==== 43 43 44 - ===**3.UploadData toNeurodiagnoses** ===47 +The framework organizes diagnostic data into three axes: 45 45 46 -Once preprocessed, data can be uploaded to EBRAINS or GitHub. 49 +1. **Etiology**: Genetic and environmental risk factors. 50 +1. **Molecular Markers**: Biomarkers such as amyloid-beta, tau, and alpha-synuclein. 51 +1. **Neuroanatomical Correlations**: Results from neuroimaging (e.g., MRI, PET). 47 47 48 -* ((( 49 -**Option 1: Upload to EBRAINS Bucket** 53 +==== **Recommendation System** ==== 50 50 51 -* Location: [[EBRAINS Neurodiagnoses Bucket>>url:https://wiki.ebrains.eu/bin/view/Collabs/neurodiagnoses/Bucket]] 52 -* Ensure correct metadata tagging before submission. 53 -))) 54 -* ((( 55 -**Option 2: Contribute via GitHub Repository** 55 +* Suggests additional tests or biomarkers if gaps are detected in the data. 56 +* Prioritizes tests based on clinical impact and cost-effectiveness. 56 56 57 -* Location: [[GitHub Data Repository>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/tree/main/data]] 58 -* Create a new folder under /data/ and include dataset description. 59 -))) 58 +---- 60 60 61 - //Note:Forlarge datasets, please contact theprojectdministratorsbefore uploading.//60 +=== **4. Computational Workflow** === 62 62 63 -=== **4. Integrate Data into AI Models** === 62 +1. **Data Loading**: Import data from storage (Drive or Bucket). 63 +1. **Feature Engineering**: Generate derived features from the raw data. 64 +1. **Model Training**: 65 +1*. Split data into training, validation, and test sets. 66 +1*. Train models with cross-validation to ensure robustness. 67 +1. **Evaluation**: 68 +1*. Metrics: Accuracy, F1-Score, AUIC for interpretability. 69 +1*. Compare against baseline models and domain benchmarks. 64 64 65 -Once uploaded, datasets must be harmonized and formatted before AI model training. 66 - 67 -==== **Steps for Data Integration** ==== 68 - 69 -* Open Jupyter Notebooks on EBRAINS to run preprocessing scripts. 70 -* Standardize neuroimaging and biomarker formats using harmonization tools. 71 -* Use machine learning models to handle missing data and feature extraction. 72 -* Train AI models with newly integrated patient cohorts. 73 -* Reference: [[Detailed instructions can be found in docs/data_processing.md>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/docs/data_processing.md]]. 74 - 75 75 ---- 76 76 77 -== ** DatabaseSources Table** ==73 +=== **5. Validation** === 78 78 79 -=== ** Where toInsertThis** ===75 +==== **Internal Validation** ==== 80 80 81 -* GitHub:[[docs/data_sources.md>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/docs/data_sources.md]]82 -* EBRAINS Wiki: Collabs/neurodiagnoses/DataSources77 +* Test the system using simulated datasets and known clinical cases. 78 +* Fine-tune models based on validation results. 83 83 84 -=== ** Key Databases forNeurodiagnoses** ===80 +==== **External Validation** ==== 85 85 86 -|=Database|=Focus Area|=Data Type|=Access Link 87 -|ADNI|Alzheimer's Disease|MRI, PET, CSF, cognitive tests|ADNI 88 -|PPMI|Parkinson’s Disease|Imaging, biospecimens|[[PPMI>>url:https://www.ppmi-info.org/]] 89 -|GP2|Genetic Data for PD|Whole-genome sequencing|[[GP2>>url:https://gp2.org/]] 90 -|Enroll-HD|Huntington’s Disease|Clinical, genetic, imaging|[[Enroll-HD>>url:https://enroll-hd.org/]] 91 -|GAAIN|Alzheimer's & Cognitive Decline|Multi-source data aggregation|[[GAAIN>>url:https://www.gaain.org/]] 92 -|UK Biobank|Population-wide studies|Genetic, imaging, health records|[[UK Biobank>>url:https://www.ukbiobank.ac.uk/]] 93 -|DPUK|Dementia & Aging|Imaging, genetics, lifestyle factors|[[DPUK>>url:https://www.dementiasplatform.uk/]] 94 -|PRION Registry|Prion Diseases|Clinical and genetic data|[[PRION Registry>>url:https://www.prionalliance.org/]] 95 -|DECIPHER|Rare Genetic Disorders|Genomic variants|DECIPHER 82 +* Collaborate with research institutions and hospitals to test the system in real-world settings. 83 +* Use anonymized patient data to ensure privacy compliance. 96 96 97 -If you know a relevant dataset, submit a proposal in [[GitHub Issues>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/issues]]. 98 - 99 99 ---- 100 100 101 -== **Collaborati on& Partnerships** ==87 +=== **6. Collaborative Development** === 102 102 103 - === **Where toInsertThis**===89 +The project is open to contributions from researchers, clinicians, and developers. Key tools include: 104 104 105 -* GitHub: [[docs/collaboration.md>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/docs/collaboration.md]] 106 -* EBRAINS Wiki: Collabs/neurodiagnoses/Collaborations 91 +* **Jupyter Notebooks**: For data analysis and pipeline development. 92 +* **Wiki Pages**: For documenting methods and results. 93 +* **Drive and Bucket**: For sharing code, data, and outputs. 107 107 108 -=== **Partnering with Data Providers** === 109 - 110 -Beyond using existing datasets, Neurodiagnoses seeks partnerships with data repositories to: 111 - 112 -* Enable direct API-based data integration for real-time processing. 113 -* Co-develop harmonized AI-ready datasets with standardized annotations. 114 -* Secure funding opportunities through joint grant applications. 115 - 116 -=== **Interested in Partnering?** === 117 - 118 -If you represent a research consortium or database provider, reach out to explore data-sharing agreements. 119 - 120 -* Contact: [[info@neurodiagnoses.com>>mailto:info@neurodiagnoses.com]] 121 - 122 122 ---- 123 123 124 -== ** FinalNotes** ==97 +=== **7. Tools and Technologies** === 125 125 126 - Neurodiagnoses continuouslyexpands its dataecosystem tosupportAI-drivenclinical decision-making. Researchers andinstitutionsareencouraged toontributenew datasetsand methodologies.127 - 128 - For additionaltechnicaldocumentation:129 - 130 -* [[GitHub Repository>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses]]131 -* [[EBRAINSCollaboration Page>>url:https://wiki.ebrains.eu/bin/view/Collabs/neurodiagnoses/]]132 - 133 - Ifyouexperienceissuesintegratingdata, open a [[GitHub Issue>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/issues]]or consult the EBRAINS NeurodiagnosesForum.99 +* **Programming Languages**: Python for AI and data processing. 100 +* **Frameworks**: 101 +** TensorFlow and PyTorch for machine learning. 102 +** Flask or FastAPI for backend services. 103 +* **Visualization**: Plotly and Matplotlib for interactive and static visualizations. 104 +* **EBRAINS Services**: 105 +** Collaboratory Lab for running Notebooks. 106 +** Buckets for storing large datasets.