Attention: The EBRAINS drive will be unavailable for most of the weekend starting the 25th October. Although the Lab is availble while the Drive is down, files that are stored in the Drive will not be loaded and you will be unable to save documents directly on the Lab.


Changes for page Data Curation

Last modified by abonard on 2025/06/03 10:55

From version 6.1
edited by ingrreit
on 2023/03/24 17:08
Change comment: Uploaded new attachment "image-20230324170807-1.png", version {1}
To version 31.1
edited by ingrreit
on 2023/03/26 07:15
Change comment: There is no comment for this version

Summary

Details

Page properties
Content
... ... @@ -1,35 +1,267 @@
1 -(% class="jumbotron" %)
1 +== Publishing data, models and software via EBRAINS ==
2 +
3 +The aim of this collab is to provide you with all the information you need to publish your experimental data, simulations, computational models, and software via EBRAINS. Have you already published your data somewhere else? You can increase the exposure and impact of your shared dataset by also listing it on EBRAINS.
4 +
5 +{{box title="**Contents**"}}
6 +{{toc depth="3" start="2"/}}
7 +{{/box}}
8 +
9 +
10 +(% style="text-align: center;" %)
11 +**Get started! **
12 +
13 +(% style="text-align: center;" %)
14 +**[[REQUEST CURATION>>https://nettskjema.no/a/277393#/]] **
15 +
16 +(% style="text-align: center;" %)
17 + Search existing data, models and software in [[the EBRAINS Knowledge Graph Search>>https://kg.ebrains.eu/search/?facet_type[0]=Dataset]]
18 +
19 +
20 +(% style="color:#e74c3c" %)[EDIT required]
21 +
22 +(% style="color:#e74c3c" %)Sharing your data, models or code (research products) via EBRAINS makes it discoverable amongst other research products available in the EBRAINS Knowledge Graph>>(%%)[[(% style="color:#e74c3c" %)https:~~/~~/kg.ebrains.eu/>>https://kg.ebrains.eu/]]]]. This is made possible by the highly flexible metadata framework describing neuroscience data in detail. EBRAINS is gradually implementing interconnected tools and analysis workflows developed in the Human Brain Project (HBP) to further enhance the output from adding your dataset to the database.
23 +
24 +----
25 +
26 +== **The EBRAINS curation process** ==
27 +
28 +In EBRAINS, multimodal and heterogenous neuroscience data, models and software are categorised and described in a standardised manner so that they can be effectively searched, compared, and analysed. This effort is referred to as curation. 
29 +
30 +>The EBRAINS curation process involves organising and annotating neuroscientific data to make the data discoverable and reusable.
31 +
32 +Behind this process is the EBRAINS Curation team. Our mandate is to support you in sharing your data in line with the [[**FAIR principles**>>https://www.go-fair.org/fair-principles/]], whether you choose to describe only the key aspects of your data, or can invest in adding more detailed metadata.
33 +
34 +(% class="box floatinginfobox" %)
2 2  (((
3 -(% class="container" %)
36 +We strongly recommend to start preparing for data sharing as early as possible. With a structured data repository and adequate notes on how the data was acquired, you greatly minimize the effort required to publish your data. The time it takes to share data on EBRAINS heavily depends on on the engagement from the researcher and how well the data and metadata is prepared before-hand. **[[Contact us to prepare for sharing>>mailto:curation-support@ebrains.eu]]. **
37 +)))
38 +
39 +=== ===
40 +
41 +=== ===
42 +
43 +=== Step by step - Experimental data ===
44 +
45 +
46 +[[image:image-20230326054341-1.png]]
47 +
48 +==== **1. Provide some general information about your dataset** ====
49 +
50 +The [[Curation request form>>https://nettskjema.no/a/277393#/]] collects preliminary information about your data, allowing us to assess whether the dataset fits within the scope of EBRAINS. The submission generates a curation ID allowing us to track the case.
51 +
52 +The [[Ethics and Regulatory compliance form>>https://nettskjema.no/a/224765]] collects the necessary information needed for us to evaluate whether we can safely and legally share the data on the EBRAINS platforms.
53 +
54 +
55 +==== **2. Upload data ** ====
56 +
57 +EBRAINS offers secure, long-term storage at [[CSCS Swiss National Supercomputing Centre>>url:https://www.cscs.ch/]], with currently no upper limit of storage capacity. The data must be consistently structured prior to upload. 
58 +
59 +For smaller datasets with a reasonable amount of files, we recommend using the **Collab-Bucket solution (drag-and-drop)**. A Collab Bucket must first be assigned to a dataset, which happens when a datasets is accepted for sharing.
60 +
61 +For larger datasets or datasets with a large amount of files, we recommend using a **programmatic approach**. The [[python script>>https://github.com/eapapp/ebrains-data-storage/tree/main/data-proxy]] is interactive and does not require any additional programming.
62 +
63 +
64 +If a data collection is already uploaded elsewhere, we may link to the already existing repository.
65 +
66 +
67 +==== **3. Submit metadata** ====
68 +
69 +Easily submit openMINDS-compatible metadata via our [[metadata wizard>>https://ebrains-metadata-wizard.apps.hbp.eu/]]. This form covers all the required metadata for sharing data via EBRAINS. When you're ready to 'Submit', the metadata and all uploaded files will be sent to the Curation team.
70 +
71 +For power-users interested in exploring the full span of the openMINDS framework, please check out the [[openMINDS GitHub>>https://github.com/HumanBrainProject/openMINDS]] to learn more about how to programmatically gather your metadata. A stable version of the openMINDS package can be found on [[PyPi>>https://pypi.org/project/openMINDS/]]. We accept openMINDS metadata as JSON-LD (share these with us via curation-support@ebrains.eu). Additional documentation of openMINDS metadata submodules and schemas can be found on [[the openMINDS GitHub Wiki>>https://humanbrainproject.github.io/openMINDS/]].
72 +
73 +
74 +==== **4. Write a Data Descriptor ** ====
75 +
76 +The Data Descriptor is a document helping others interpret and reuse (and prevent misuse) of your data, and is critical to achieve a basic level of FAIR. The document will be uploaded in the repository of the data, shared as a PDF. 
77 +
78 +[[The template >>https://drive.ebrains.eu/f/a2e07c95b1a54090bbbc/?dl=1]]safely guides you through the process of making one. Check out previous examples in the KG Search, e.g. the Data Descriptor for a dataset containing histology images of the rat brain stained for an anterograde tracer (see [[an example>>https://doi.org/10.25493/2MX9-3XF]]).
79 +
80 +
81 +Journal publications sufficiently describing the shared data, such as made available through [[Nature Scientific Data>>http://www.nature.com/sdata/about]], [[Elsevier Data in Brief>>http://www.journals.elsevier.com/data-in-brief/]], [[BMC Data note>>https://bmcresnotes.biomedcentral.com/submission-guidelines/preparing-your-manuscript/data-note]] and more, can replace the EBRAINS Data Descriptor.
82 +
83 +
84 +==== **5. Preview and publish ** ====
85 +
86 +A Curator will assemble a dataset in the EBRAINS Knowledge Graph that combines the data, metadata and data descriptor. Once ready, the data provider will receive a private URL for previewing the dataset prior to release. We need an official approval from the data custodian{{footnote}}The Data Custodian is responsible for the content and quality of the Data and metadata, and is the person to be contacted by EBRAINS CS in case of any misconduct related to the Data. It is the obligation of a Data Custodian to keep EBRAINS informed about changes in the contact information of the authors of the Datasets provided by them ([[EBRAINS Data Provision Protocol - version 1.1>>https://strapi-prod.sos-ch-dk-2.exo.io/EBRAINS_Data_Provision_Protocol_dfe0dcb104.pdf]]).{{/footnote}} to release the dataset. Once released, a [[DataCite DOI>>https://datacite.org/]] will be generated for the dataset. If the identical data collection has received a DOI elsewhere, we recommend re-using the already issued DOI.
87 +
88 +
89 +----
90 +
91 +==== **Sharing human data ** ====
92 +
93 +We must ensure data shared on EBRAINS comply with [[GDPR >>https://gdpr-info.eu/]]and [[EU directives>>https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A32010L0063]]. The information we need to assess this is collected via our [[Ethics and Regulatory Compliance Survey>>https://nettskjema.no/a/224765]].
94 +
95 +(% class="box floatinginfobox" %)
4 4  (((
5 -= My Collab's Extended Title =
97 +For **Human subject data**, the data must be //either//
6 6  
7 -My collab's subtitle
99 +- Post-mortem data
100 +
101 +- Aggregated data
102 +
103 +- Pseudonymized subject data with a legal basis for sharing (e.g. Informed Consent)
104 +
105 +(% class="small" %)//If you have human data that do not classify as any of the above, please get in touch and we will clarify the available options. //
8 8  )))
9 -)))
10 10  
11 -(% class="row" %)
12 -(((
13 -(% class="col-xs-12 col-sm-8" %)
14 -(((
15 -= What can I find here? =
108 +Pseudonymized data is shared via the Human Data Gateway (HDG) due to GDPR regulations. The HDG adds an authentication layer to the data.
16 16  
17 -* Notice how the table of contents on the right
18 -* is automatically updated
19 -* to hold this page's headers
110 +**Data users** must request access to the data (via their EBRAINS account) and will receive access provided they actively accept the [[EBRAINS Access Policy>>https://ebrains.eu/terms#access-policy]], the [[EBRAINS General Terms of Use>>https://ebrains.eu/terms#general-terms-of-use]], and the [[EBRAINS Data Use Agreement>>https://ebrains.eu/terms#data-use-agreement]]. The account holder also have to accept that information about their request and access to specific data under HDG is being tracked and stored.
111 +\\**Data owners** must be aware that sharing under the HDG affects the legal responsibilities for the data. They must agree to joint control of the data (see the [[Data Provision Protocol v1>>url:https://strapi-prod.sos-ch-dk-2.exo.io/EBRAINS_Data_Provision_Protocol_dfe0dcb104.pdf]], section 1.4 - 1.5) and the Data Protection Officers of the responsible institutions must have accepted that the data can be shared under HDG.
112 +\\**Human Data Gateway, Background**
113 +HDG was introduced in February 2021 and developed across multiple teams in the HBP. The initiative to create the service and the initial design originated from EBRAINS Curation in close collaboration with the Data compliance team and the HBP Data Governance Working Group. HDG is a response to the needs of multiple data providers who are bringing data of human origin to EBRAINS. HDG covers the sharing of a limited range of data of human origin, i.e., data without direct identifiers and with very few indirect identifiers (strongly pseudonymized, de-identified). It is an extension of the existing services and does not replace the future EBRAINS Service for sensitive data (planned for 2024) which is outside the domain of the current EBRAINS Data and Knowledge services.
20 20  
21 -= Who has access? =
22 22  
23 -Describe the audience of this collab.
116 +----
117 +
118 +=== Step by Step - Models ===
119 +
120 +[place-holder-process-diagram]
121 +
122 +==== **1. model step 1 ** ====
123 +
124 +Text
125 +
126 +
127 +==== **2. model step 2** ====
128 +
129 +Text
130 +
131 +
132 +----
133 +
134 +=== Step by Step - Code ===
135 +
136 +[place-holder-process-diagram]
137 +
138 +==== **1. code step 1 ** ====
139 +
140 +Text
141 +
142 +
143 +==== **2. code step 2** ====
144 +
145 +Text
146 +
147 +----
148 +
149 +=== Output / result, when you've completed the curation process, what do you get ===
150 +
151 +Curated data, models and software are made available in the [[the EBRAINS Knowledge Graph>>https://kg.ebrains.eu/]]. This makes the data and metadata discoverable in the [[Knowledge Graph Search>>url:https://search.kg.ebrains.eu/]] and programmatically via the [[Knowledge Graph API>>url:https://docs.kg.ebrains.eu/8387ccd27a186dea3dd0b949dc528842/api_endpoints.html]]. The data, models and software are integrated in the EBRAINS Knowledge Graph by interoperable metadata schemas as defined in [[openMINDS>>url:https://github.com/HumanBrainProject/openMINDS/wiki]].
152 +
153 +Data and models are linked to and discoverable via the species-specific [[EBRAINS Interactive Atlas Viewer>>url:https://ebrains.eu/services/atlases/brain-atlases]] by using interoperable metadata schemas as defined in [[SANDS>>url:https://github.com/HumanBrainProject/SANDS/wiki]].
154 +
155 +----
156 +
157 +== **Resources for researchers looking to share data** ==
158 +
159 +Below you can find some resources that can come in handy if you are looking to share data via EBRAINS, or in general.
160 +
161 +----
162 +
163 +=== **Why should I share data?** ===
164 +
165 +By sharing your data via EBRAINS, you gain access to the following benefits:
166 +
167 +[[image:image-20230324170841-3.png]]
168 +
169 +
170 +
171 +We support you to better follow the FAIR^^ ^^guiding principles for data management and stewardship{{footnote}}Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18 {{/footnote}}.  Publishing data, models or code via EBRAINS will provide you with a citeable [[DataCite DOI>>https://www.doi.org/the-identifier/resources/handbook/]] for your research product.
172 +
173 +
174 +----
175 +
176 +(% style="font-family:inherit" %) (% style="color:#1a202c; font-family:inherit; font-size:26px" %)**What can I share on EBRAINS? **
177 +
178 +(% class="wikigeneratedid" id="H" %)
179 +[[image:image-20230324170829-2.png]]
180 +
181 +
182 +----
183 +
184 +=== **Useful information about sharing of experimental data on EBRAINS ** ===
185 +
186 +
187 +|(% style="width:593px" %)(((
188 +[[[[image:image-20230324171114-2.png]]>>https://drive.ebrains.eu/f/dfd374b9b43a458192e9/]]
189 +)))|(% style="width:1240px" %)(((
190 +[[[[image:image-20230324171109-1.png]]>>https://drive.ebrains.eu/f/c1ccb78be52e4bdba7cf/]]
24 24  )))
192 +|(% style="width:593px" %)//Collection of useful information for researchers looking to share experimental data on EBRAINS.//|(% style="width:1240px" %)//The EBRAINS data descriptor//
25 25  
194 +----
26 26  
27 -(% class="col-xs-12 col-sm-4" %)
196 +=== **Introduction to data organisation ** ===
197 +
198 +Have you ever experienced not being able to find a file that you were sure you had somewhere? We have prepared a [[collection of guidelines>>https://drive.ebrains.eu/smart-link/25299f04-c4e5-4028-8f5f-3b8208f9a532/]] and [[advice>>https://drive.ebrains.eu/lib/f5cf4964-f095-49bd-8c34-e4ffda05a497/file/DataOrganisation.zip]] on how to organise files and folders to ensure consistency and reproducibility in the future.
199 +
200 +* Why is data organisation important?
201 +* How to organise my data repository?
202 +* What is a Data Descriptor and why do I need one?
203 +
204 +----
205 +
206 +=== **Integrate your data in the EBRAINS atlas services** ===
207 +
208 +EBRAINS supports viewers for a variety of data, and is continuously looking to improve the services for visualising data. For 2D histology image data that is registered to an EBRAINS supported brain atlas, the data and the overlying atlas plates can be uploaded to the LocaliZoom viewer. See for example the [[LocaliZoom links available for this dataset>>https://doi.org/10.25493/T686-7BX]] as an example.
209 +
210 +To learn more about how to integrate your data to atlases, check out the [[Atlas services>>https://ebrains.eu/services/atlases#Integratedatatoanatlas]] on ebrains.eu.
211 +
212 +----
213 +
214 +=== **Common concerns - and answers ** ===
215 +
28 28  (((
29 -{{box title="**Contents**"}}
30 -{{toc/}}
31 -{{/box}}
217 +==== ====
32 32  
33 -
219 +>
220 +
221 +(((
222 +>The curation process is time consuming and difficult
34 34  )))
35 35  )))
225 +
226 +(% class="wikigeneratedid" id="HHowcanIshareA0models3F" %)
227 +Publishing your data naturally takes some time and effort but we will support you as much as possible. If communication is on a regular basis, we are able to finish basic curation - from the initial contact to dataset release - within two weeks.
228 +
229 +>Sharing my data is not beneficial for me - only for others
230 +
231 +
232 +When you publish your data via EBRAINS, we provide comprehensive data management support and safe long term storage - all free of charge. Additionally, your data can be cited, just like a scientific journal article. Sharing your data may even lead to new funding opportunities. Many funders specifically support projects that are part of the “Open Science” initiative.
233 +
234 +>My data is too insignificant to share
235 +
236 +
237 +There is no such thing as insignificant data. Data that is considered insignificant for a given topic, may have great significance for another. By making “insignificant” data publicly available, other researchers may find something interesting that was off-topic for your own purposes.
238 +
239 +>My data can easily be misused if I share it with the world
240 +
241 +
242 +Your data will be covered by a Creative Commons license of your choice. There are a variety of licenses available, enabling you to prevent use for specific purposes, e.g. commercial use.
243 +
244 +>I don't think I'm allowed to share my data
245 +
246 +
247 +Many institutions are still very careful about what can be shared and how, but the situation is constantly evolving. As a researcher providing data, you will be asked to fill out an ethics compliance survey which survey to ensure that data published through the EBRAINS platform has been collected according to EU regulations. We are working on solutions for sharing anonymised human data that complies with GDPR standards to protect the identity of research subjects.
248 +
249 +>I can't share my data before my paper is published
250 +
251 +
252 +If you do not want to share your data before publishing the results in an article, you can publish your dataset with an embargo status. This will make it possible to find information about the data without making the data itself available, and give you a citeable DOI.
253 +
254 +>If I share my data before I publish the associated paper, I will lose my competitive edge
255 +
256 +
257 +Publishing your data does not mean that others can use it however they want. Use of your data will require citation, and by choosing an appropriate Creative Commons licence you decide what others are allowed to do with it. If you still feel worried, you can publish your data under embargo, and in this way delay the date of data release, but still make it possible for others to find the information about the data.
258 +
259 +
260 +----
261 +
262 +== Contact ==
263 +
264 +[[curation-support@ebrains.eu>>mailto:curation-support@ebrains.eu]]
265 +
266 +
267 +{{putFootnotes/}}
image-20230324170829-2.png
Author
... ... @@ -1,0 +1,1 @@
1 +XWiki.ingrreit
Size
... ... @@ -1,0 +1,1 @@
1 +292.7 KB
Content
image-20230324170841-3.png
Author
... ... @@ -1,0 +1,1 @@
1 +XWiki.ingrreit
Size
... ... @@ -1,0 +1,1 @@
1 +107.1 KB
Content
image-20230324170858-4.png
Author
... ... @@ -1,0 +1,1 @@
1 +XWiki.ingrreit
Size
... ... @@ -1,0 +1,1 @@
1 +141.7 KB
Content
image-20230324171109-1.png
Author
... ... @@ -1,0 +1,1 @@
1 +XWiki.ingrreit
Size
... ... @@ -1,0 +1,1 @@
1 +49.1 KB
Content
image-20230324171114-2.png
Author
... ... @@ -1,0 +1,1 @@
1 +XWiki.ingrreit
Size
... ... @@ -1,0 +1,1 @@
1 +75.8 KB
Content
image-20230326053826-1.png
Author
... ... @@ -1,0 +1,1 @@
1 +XWiki.ingrreit
Size
... ... @@ -1,0 +1,1 @@
1 +97.9 KB
Content
image-20230326053845-1.png
Author
... ... @@ -1,0 +1,1 @@
1 +XWiki.ingrreit
Size
... ... @@ -1,0 +1,1 @@
1 +97.9 KB
Content
image-20230326054341-1.png
Author
... ... @@ -1,0 +1,1 @@
1 +XWiki.ingrreit
Size
... ... @@ -1,0 +1,1 @@
1 +64.1 KB
Content
Public

Data Curation