Changes for page Methodology
Last modified by manuelmenendez on 2025/03/14 08:31
From version 18.1
edited by manuelmenendez
on 2025/02/13 12:52
on 2025/02/13 12:52
Change comment:
There is no comment for this version
To version 5.1
edited by manuelmenendez
on 2025/01/29 19:11
on 2025/01/29 19:11
Change comment:
There is no comment for this version
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -1,133 +1,109 @@ 1 -== **Overview** == 1 +=== **Overview** === 2 2 3 -Neurodiagnoses develop satridimensional diagnostic framework forCNS diseases, incorporating AI-powered annotation tools tomprove interpretability,standardization, and clinical utility. The methodology integratesmulti-modaldata,includinggenetic,neuroimaging,neurophysiological, andbiomarker datasets, andappliesmachinelearning modelstogenerate structured, explainable diagnosticoutputs.3 +This section describes the step-by-step process used in the **Neurodiagnoses** project to develop a novel diagnostic framework for neurological diseases. The methodology integrates artificial intelligence (AI), biomedical ontologies, and computational neuroscience to create a structured, interpretable, and scalable diagnostic system. 4 4 5 5 ---- 6 6 7 -== ** Howto Use ExternalDatabasesinNeurodiagnoses** ==7 +=== **1. Data Integration** === 8 8 9 - Toenhancetheaccuracyof ourdiagnosticmodels,Neurodiagnoses integrates data from multiple biomedical and neurological research databases. If you are a researcher, follow these steps to access, prepare, and integrate data into the Neurodiagnoses framework.9 +==== **Data Sources** ==== 10 10 11 -=== **Potential Data Sources** === 11 +* **Biomedical Ontologies**: 12 +** Human Phenotype Ontology (HPO) for phenotypic abnormalities. 13 +** Gene Ontology (GO) for molecular and cellular processes. 14 +* **Neuroimaging Datasets**: 15 +** Example: Alzheimer’s Disease Neuroimaging Initiative (ADNI), OpenNeuro. 16 +* **Clinical and Biomarker Data**: 17 +** Anonymized clinical reports, molecular biomarkers, and test results. 12 12 13 -Neurodiagnoses maintains an updated list of potential biomedical databases relevant to neurodegenerative diseases. 14 14 15 - *Reference: [[List of PotentialDatabases>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/data/sources/list_of_potential_databases]]20 +==== **Data Preprocessing** ==== 16 16 17 -=== **1. Register for Access** === 22 +1. **Standardization**: Ensure all data sources are normalized to a common format. 23 +1. **Feature Selection**: Identify relevant features for diagnosis (e.g., biomarkers, imaging scores). 24 +1. **Data Cleaning**: Handle missing values and remove duplicates. 18 18 19 - Each external database requires individual registration and access approval. Follow the official guidelines of each database provider.26 +---- 20 20 21 -* Ensure that you have completed all ethical approvals and data access agreements before integrating datasets into Neurodiagnoses. 22 -* Some repositories require a Data Usage Agreement (DUA) before downloading sensitive medical data. 28 +=== **2. AI-Based Analysis** === 23 23 24 -=== ** 2. Download& PrepareData** ===30 +==== **Model Development** ==== 25 25 26 -Once access is granted, download datasets while complying with data usage policies. Ensure that the files meet Neurodiagnoses’ format requirements for smooth integration. 32 +* **Embedding Models**: Use pre-trained models like BioBERT or BioLORD for text data. 33 +* **Classification Models**: 34 +** Algorithms: Random Forest, Support Vector Machines (SVM), or neural networks. 35 +** Purpose: Predict the likelihood of specific neurological conditions based on input data. 27 27 28 -==== ** SupportedFile Formats** ====37 +==== **Dimensionality Reduction and Interpretability** ==== 29 29 30 -* Tabular Data: .csv, .tsv 31 -* Neuroimaging Data: .nii, .dcm 32 -* Genomic Data: .fasta, .vcf 33 -* Clinical Metadata: .json, .xml 39 +* Leverage [[DEIBO>>https://drive.ebrains.eu/f/8d7157708cde4b258db0/]] (Data-driven Embedding Interpretation Based on Ontologies) to connect model dimensions to ontology concepts. 40 +* Evaluate interpretability using metrics like the Area Under the Interpretability Curve (AUIC). 34 34 35 - ==== **Mandatory Fields for Integration** ====42 +---- 36 36 37 -|=Field Name|=Description 38 -|Subject ID|Unique patient identifier 39 -|Diagnosis|Standardized disease classification 40 -|Biomarkers|CSF, plasma, or imaging biomarkers 41 -|Genetic Data|Whole-genome or exome sequencing 42 -|Neuroimaging Metadata|MRI/PET acquisition parameters 44 +=== **3. Diagnostic Framework** === 43 43 44 -=== ** 3.UploadData to Neurodiagnoses** ===46 +==== **Axes of Diagnosis** ==== 45 45 46 - Oncepreprocessed,datacanbe uploadedtoEBRAINS orGitHub.48 +The framework organizes diagnostic data into three axes: 47 47 48 -* ((( 49 -**Option 1: Upload to EBRAINS Bucket** 50 +1. **Etiology**: Genetic and environmental risk factors. 51 +1. **Molecular Markers**: Biomarkers such as amyloid-beta, tau, and alpha-synuclein. 52 +1. **Neuroanatomical Correlations**: Results from neuroimaging (e.g., MRI, PET). 50 50 51 -* Location: [[EBRAINS Neurodiagnoses Bucket>>url:https://wiki.ebrains.eu/bin/view/Collabs/neurodiagnoses/Bucket]] 52 -* Ensure correct metadata tagging before submission. 53 -))) 54 -* ((( 55 -**Option 2: Contribute via GitHub Repository** 54 +==== **Recommendation System** ==== 56 56 57 -* Location: [[GitHub Data Repository>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/tree/main/data]] 58 -* Create a new folder under /data/ and include dataset description. 59 -))) 56 +* Suggests additional tests or biomarkers if gaps are detected in the data. 57 +* Prioritizes tests based on clinical impact and cost-effectiveness. 60 60 61 - //Note: For large datasets, please contact the project administrators before uploading.//59 +---- 62 62 63 -=== **4. Integrate DataintoAI Models** ===61 +=== **4. Computational Workflow** === 64 64 65 -Once uploaded, datasets must be harmonized and formatted before AI model training. 63 +1. **Data Loading**: Import data from storage (Drive or Bucket). 64 +1. **Feature Engineering**: Generate derived features from the raw data. 65 +1. **Model Training**: 66 +1*. Split data into training, validation, and test sets. 67 +1*. Train models with cross-validation to ensure robustness. 68 +1. **Evaluation**: 69 +1*. Metrics: Accuracy, F1-Score, AUIC for interpretability. 70 +1*. Compare against baseline models and domain benchmarks. 66 66 67 -==== **Steps for Data Integration** ==== 68 - 69 -* Open Jupyter Notebooks on EBRAINS to run preprocessing scripts. 70 -* Standardize neuroimaging and biomarker formats using harmonization tools. 71 -* Use machine learning models to handle missing data and feature extraction. 72 -* Train AI models with newly integrated patient cohorts. 73 -* Reference: [[Detailed instructions can be found in docs/data_processing.md>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/docs/data_processing.md]]. 74 - 75 75 ---- 76 76 77 -== ** DatabaseSources Table** ==74 +=== **5. Validation** === 78 78 79 -=== ** Where toInsertThis** ===76 +==== **Internal Validation** ==== 80 80 81 -* GitHub:[[docs/data_sources.md>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/docs/data_sources.md]]82 -* EBRAINS Wiki: Collabs/neurodiagnoses/DataSources78 +* Test the system using simulated datasets and known clinical cases. 79 +* Fine-tune models based on validation results. 83 83 84 -=== ** Key Databases forNeurodiagnoses** ===81 +==== **External Validation** ==== 85 85 86 -|=Database|=Focus Area|=Data Type|=Access Link 87 -|ADNI|Alzheimer's Disease|MRI, PET, CSF, cognitive tests|ADNI 88 -|PPMI|Parkinson’s Disease|Imaging, biospecimens|[[PPMI>>url:https://www.ppmi-info.org/]] 89 -|GP2|Genetic Data for PD|Whole-genome sequencing|[[GP2>>url:https://gp2.org/]] 90 -|Enroll-HD|Huntington’s Disease|Clinical, genetic, imaging|[[Enroll-HD>>url:https://enroll-hd.org/]] 91 -|GAAIN|Alzheimer's & Cognitive Decline|Multi-source data aggregation|[[GAAIN>>url:https://www.gaain.org/]] 92 -|UK Biobank|Population-wide studies|Genetic, imaging, health records|[[UK Biobank>>url:https://www.ukbiobank.ac.uk/]] 93 -|DPUK|Dementia & Aging|Imaging, genetics, lifestyle factors|[[DPUK>>url:https://www.dementiasplatform.uk/]] 94 -|PRION Registry|Prion Diseases|Clinical and genetic data|[[PRION Registry>>url:https://www.prionalliance.org/]] 95 -|DECIPHER|Rare Genetic Disorders|Genomic variants|DECIPHER 83 +* Collaborate with research institutions and hospitals to test the system in real-world settings. 84 +* Use anonymized patient data to ensure privacy compliance. 96 96 97 -If you know a relevant dataset, submit a proposal in [[GitHub Issues>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/issues]]. 98 - 99 99 ---- 100 100 101 -== **Collaborati on& Partnerships** ==88 +=== **6. Collaborative Development** === 102 102 103 - === **Where toInsertThis**===90 +The project is open to contributions from researchers, clinicians, and developers. Key tools include: 104 104 105 -* GitHub: [[docs/collaboration.md>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/docs/collaboration.md]] 106 -* EBRAINS Wiki: Collabs/neurodiagnoses/Collaborations 92 +* **Jupyter Notebooks**: For data analysis and pipeline development. 93 +** Example: [[probabilistic imputation>>https://drive.ebrains.eu/f/4f69ab52f7734ef48217/]] 94 +* **Wiki Pages**: For documenting methods and results. 95 +* **Drive and Bucket**: For sharing code, data, and outputs. 96 +* **Collaboration with related projects: **For instance: [[//Beyond the hype: AI in dementia – from early risk detection to disease treatment//>>https://www.lethe-project.eu/beyond-the-hype-ai-in-dementia-from-early-risk-detection-to-disease-treatment/]] 107 107 108 -=== **Partnering with Data Providers** === 109 - 110 -Beyond using existing datasets, Neurodiagnoses seeks partnerships with data repositories to: 111 - 112 -* Enable direct API-based data integration for real-time processing. 113 -* Co-develop harmonized AI-ready datasets with standardized annotations. 114 -* Secure funding opportunities through joint grant applications. 115 - 116 -=== **Interested in Partnering?** === 117 - 118 -If you represent a research consortium or database provider, reach out to explore data-sharing agreements. 119 - 120 -* Contact: [[info@neurodiagnoses.com>>mailto:info@neurodiagnoses.com]] 121 - 122 122 ---- 123 123 124 -== ** FinalNotes** ==100 +=== **7. Tools and Technologies** === 125 125 126 - Neurodiagnoses continuouslyexpands its dataecosystem tosupportAI-drivenclinical decision-making. Researchers andinstitutionsareencouraged toontributenew datasetsand methodologies.127 - 128 - For additionaltechnicaldocumentation:129 - 130 -* [[GitHub Repository>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses]]131 -* [[EBRAINSCollaboration Page>>url:https://wiki.ebrains.eu/bin/view/Collabs/neurodiagnoses/]]132 - 133 - Ifyouexperienceissuesintegratingdata, open a [[GitHub Issue>>url:https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/issues]]or consult the EBRAINS NeurodiagnosesForum.102 +* **Programming Languages**: Python for AI and data processing. 103 +* **Frameworks**: 104 +** TensorFlow and PyTorch for machine learning. 105 +** Flask or FastAPI for backend services. 106 +* **Visualization**: Plotly and Matplotlib for interactive and static visualizations. 107 +* **EBRAINS Services**: 108 +** Collaboratory Lab for running Notebooks. 109 +** Buckets for storing large datasets.