Methodology - HBP Wiki

==== **Overview** ====

This project develops a **tridimensional diagnostic framework** for **CNS diseases**, incorporating **AI-powered annotation tools** to improve **interpretability, standardization, and clinical utility**. The methodology integrates **multi-modal data**, including **genetic, neuroimaging, neurophysiological, and biomarker datasets**, and applies **machine learning models** to generate **structured, explainable diagnostic outputs**.

4

=== **Workflow** ===

1. (((

**We Use GitHub to [[Store and develop AI models, scripts, and annotation pipelines.>>https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/discussions]]**

9

10

* Create a **GitHub repository** for AI scripts and models.

11

* Use **GitHub Projects** to manage research milestones.

12

)))

13

1. (((

14

**We Use EBRAINS for Data & Collaboration**

15

16

* Store **biomarker and neuroimaging data** in **EBRAINS Buckets**.

17

* Run **Jupyter Notebooks** in **EBRAINS Lab** to test AI models.

18

* Use **EBRAINS Wiki** for structured documentation and research discussion.

)))

----

=== **1. Data Integration** ===

== Overview ==

Neurodiagnoses integrates clinical data via the **EBRAINS Medical Informatics Platform (MIP)**. MIP federates decentralized clinical data, allowing Neurodiagnoses to securely access and process sensitive information for AI-based diagnostics.

== How It Works ==

1. (((

**Authentication & API Access:**

35

36

* Users must have an **EBRAINS account**.

37

* Neurodiagnoses uses **secure API endpoints** to fetch clinical data (e.g., from the **Federation for Dementia**).

38

)))

39

1. (((

40

**Data Mapping & Harmonization:**

41

42

* Retrieved data is **normalized** and converted to standard formats (.csv, .json).

43

* Data from **multiple sources** is harmonized to ensure consistency for AI processing.

44

)))

45

1. (((

46

**Security & Compliance:**

47

48

* All data access is **logged and monitored**.

49

* Data remains on **MIP servers** using **federated learning techniques** when possible.

50

* Access is granted only after signing a **Data Usage Agreement (DUA)**.

51

)))

52

53

== Implementation Steps ==

54

55

56

1. Clone the repository.

57

1. Configure your **EBRAINS API credentials** in mip_integration.py.

58

1. Run the script to **download and harmonize clinical data**.

59

1. Process the data for **AI model training**.

60

61

For more detailed instructions, please refer to the **[[MIP Documentation>>url:https://mip.ebrains.eu/]]**.

----

= Data Processing & Integration with Clinica.Run =

== Overview ==

Neurodiagnoses now supports **Clinica.Run**, an open-source neuroimaging platform designed for **multimodal data processing and reproducible neuroscience workflows**.

== How It Works ==

1. (((

**Neuroimaging Preprocessing:**

78

79

* MRI, PET, EEG data is preprocessed using **Clinica.Run pipelines**.

80

* Supports **longitudinal and cross-sectional analyses**.

81

)))

82

1. (((

83

**Automated Biomarker Extraction:**

84

85

* Standardized extraction of **volumetric, metabolic, and functional biomarkers**.

86

* Integration with machine learning models in Neurodiagnoses.

87

)))

88

1. (((

89

**Data Security & Compliance:**

90

91

* Clinica.Run operates in **compliance with GDPR and HIPAA**.

92

* Neuroimaging data remains **within the original storage environment**.

93

)))

94

95

== Implementation Steps ==

96

97

98

1. Install **Clinica.Run** dependencies.

99

1. Configure your **Clinica.Run pipeline** in clinica_run_config.json.

100

1. Run the pipeline for **preprocessing and biomarker extraction**.

101

1. Use processed neuroimaging data for **AI-driven diagnostics** in Neurodiagnoses.

102

103

For further information, refer to **[[Clinica.Run Documentation>>url:https://clinica.run/]]**.

==== ====

==== **Data Sources** ====

108

109

[[List of potential sources of databases>>https://github.com/Fundacion-de-Neurociencias/neurodiagnoses/blob/main/data/sources/list_of_potential_databases]]

110

111

**Biomedical Ontologies & Databases:**

112

113

* **Human Phenotype Ontology (HPO)** for symptom annotation.

114

* **Gene Ontology (GO)** for molecular and cellular processes.

115

116

**Dimensionality Reduction and Interpretability:**

117

118

* **Evaluate interpretability** using metrics like the **Area Under the Interpretability Curve (AUIC)**.

119

* **Leverage [[DEIBO>>https://github.com/Mellandd/DEIBO]] (Data-driven Embedding Interpretation Based on Ontologies)** to connect model dimensions to ontology concepts.

120

121

**Neuroimaging & EEG/MEG Data:**

122

123

* **MRI volumetric measures** for brain atrophy tracking.

124

* **EEG functional connectivity patterns** (AI-Mind).

125

126

**Clinical & Biomarker Data:**

127

128

* **CSF biomarkers** (Amyloid-beta, Tau, Neurofilament Light).

129

* **Sleep monitoring and actigraphy data** (ADIS).

130

131

**Federated Learning Integration:**

132

133

* **Secure multi-center data harmonization** (PROMINENT).

----

==== **Annotation System for Multi-Modal Data** ====

138

139

To ensure **structured integration of diverse datasets**, **Neurodiagnoses** will implement an **AI-driven annotation system**, which will:

140

141

* **Assign standardized metadata tags** to diagnostic features.

142

* **Provide contextual explanations** for AI-based classifications.

143

* **Track temporal disease progression annotations** to identify long-term trends.

----

=== **2. AI-Based Analysis** ===

148

149

==== **Machine Learning & Deep Learning Models** ====

150

151

**Risk Prediction Models:**

152

153

* **LETHE’s cognitive risk prediction model** integrated into the annotation framework.

154

155

**Biomarker Classification & Probabilistic Imputation:**

156

157

* **KNN Imputer** and **Bayesian models** used for handling **missing biomarker data**.

158

159

**Neuroimaging Feature Extraction:**

160

161

* **MRI & EEG data** annotated with **neuroanatomical feature labels**.

162

163

==== **AI-Powered Annotation System** ====

164

165

* Uses **SHAP-based interpretability tools** to explain model decisions.

166

* Generates **automated clinical annotations** in structured reports.

167

* Links findings to **standardized medical ontologies** (e.g., **SNOMED, HPO**).

----

=== **3. Diagnostic Framework & Clinical Decision Support** ===

172

173

==== **Tridimensional Diagnostic Axes** ====

174

175

**Axis 1: Etiology (Pathogenic Mechanisms)**

176

177

* Classification based on **genetic markers, cellular pathways, and environmental risk factors**.

178

* **AI-assisted annotation** provides **causal interpretations** for clinical use.

179

180

**Axis 2: Molecular Markers & Biomarkers**

181

182

* **Integration of CSF, blood, and neuroimaging biomarkers**.

183

* **Structured annotation** highlights **biological pathways linked to diagnosis**.

184

185

**Axis 3: Neuroanatomoclinical Correlations**

186

187

* **MRI and EEG data** provide anatomical and functional insights.

188

* **AI-generated progression maps** annotate **brain structure-function relationships**.

----

=== **4. Computational Workflow & Annotation Pipelines** ===

193

194

==== **Data Processing Steps** ====

**Data Ingestion:**

* **Harmonized datasets** stored in **EBRAINS Bucket**.

199

* **Preprocessing pipelines** clean and standardize data.

200

201

**Feature Engineering:**

202

203

* **AI models** extract **clinically relevant patterns** from **EEG, MRI, and biomarkers**.

204

205

**AI-Generated Annotations:**

206

207

* **Automated tagging** of diagnostic features in **structured reports**.

208

* **Explainability modules (SHAP, LIME)** ensure transparency in predictions.

209

210

**Clinical Decision Support Integration:**

211

212

* **AI-annotated findings** fed into **interactive dashboards**.

213

* **Clinicians can adjust, validate, and modify annotations**.

----

=== **5. Validation & Real-World Testing** ===

218

219

==== **Prospective Clinical Study** ====

220

221

* **Multi-center validation** of AI-based **annotations & risk stratifications**.

222

* **Benchmarking against clinician-based diagnoses**.

223

* **Real-world testing** of AI-powered **structured reporting**.

224

225

==== **Quality Assurance & Explainability** ====

226

227

* **Annotations linked to structured knowledge graphs** for improved transparency.

228

* **Interactive annotation editor** allows clinicians to validate AI outputs.

----

=== **6. Collaborative Development** ===

233

234

The project is **open to contributions** from **researchers, clinicians, and developers**.

235

236

**Key tools include:**

237

238

* **Jupyter Notebooks**: For data analysis and pipeline development.

239

** Example: **probabilistic imputation**

240

* **Wiki Pages**: For documenting methods and results.

241

* **Drive and Bucket**: For sharing code, data, and outputs.

242

* **Collaboration with related projects**:

243

** Example: **Beyond the hype: AI in dementia – from early risk detection to disease treatment**

----

=== **7. Tools and Technologies** ===

248

249

==== **Programming Languages:** ====

250

251

* **Python** for AI and data processing.

252

253

==== **Frameworks:** ====

254

255

* **TensorFlow** and **PyTorch** for machine learning.

256

* **Flask** or **FastAPI** for backend services.

257

258

==== **Visualization:** ====

259

260

* **Plotly** and **Matplotlib** for interactive and static visualizations.

261

262

==== **EBRAINS Services:** ====

263

264

* **Collaboratory Lab** for running Notebooks.

265

* **Buckets** for storing large datasets.

----

=== **Why This Matters** ===

270

271

* The annotation system ensures that AI-generated insights are structured, interpretable, and clinically meaningful.

272

* It enables real-time tracking of disease progression across the three diagnostic axes.

273

* It facilitates integration with electronic health records and decision-support tools, improving AI adoption in clinical workflows.

Wiki source code of Methodology

Neurodiagnoses