data-curation-copy

Last modified by eapapp on 2023/07/04 16:46

Publishing neuroscience data, models and software via EBRAINS

The aim of this collab is to provide you with detailed information about publishing data, simulations, computational models, and software via EBRAINS. If you want a quick overview of the sharing process, see https://ebrains.eu/service/share-data.

Contents

Information to get started
The EBRAINS curation process
The openMINDS metadata framework
Add practical value to your shared data, model or software
- Showcase shared data, models or software in other services
General benefits of sharing data
Frequently asked questions
The curation team: meet the curators
Contact
Affiliated laboratories
References

Information to get started

REQUEST CURATION to share data, simulations, computational models, and software, - or to add a new version of an existing one.

Have you already published your data somewhere else? You can increase the exposure and impact of your shared dataset by also listing it on EBRAINS.

Search existing data, models and software in the EBRAINS Knowledge Graph

EBRAINS accepts data from all modalities and from all species, as well as models, software, web services and metadata models (collectively referred to as research products) for sharing. You'll find detailed information about how to share each research product below.

We strongly recommend to start preparing for data sharing as early as possible. With a structured data repository and adequate notes on how the data was acquired, you greatly minimize the effort required to publish your data. The time it takes to share data on EBRAINS heavily depends on on the engagement from the researcher and how well the data and metadata is prepared before-hand. Contact us for personalised guidance on how to prepare for sharing.

Particular needs? Contact us! The workflows for sharing can be modified for researchers or research groups aiming to frequently publish larger numbers of their research products through EBRAINS. Please contact the curation service team in such cases. Reach us at curation-support@ebrains.eu

The EBRAINS curation process

In EBRAINS, multimodal and heterogenous neuroscience data, models and software are categorised and described in a standardised manner so that they can be effectively searched, compared, and analysed. This effort is referred to as curation.

The EBRAINS curation process involves organising and annotating neuroscientific data to make the data discoverable and reusable.

Behind this process is the EBRAINS Curation team. Our mandate is to support you in sharing your data in line with the FAIR principles, whether you choose to describe only the key aspects of your data, or can invest in adding more detailed metadata.

Curated data, models and software are made available in the EBRAINS Knowledge Graph. This makes the data and metadata discoverable in the Knowledge Graph Search and the Knowledge Graph API. The data, models and software are integrated in the EBRAINS Knowledge Graph by interoperable metadata schemas defined in openMINDS.Data and models are linked to and discoverable via the species-specific EBRAINS siibra atlas viewer by using interoperable metadata schemas as defined in SANDS.

The curation of data, models and software is different. Thus, below we explain the process for sharing for each research product separately.

Step by step - Data

1. Provide some general information about your dataset

Fill in the Curation request form.

This form collects preliminary information about your data, allowing us to assess whether the dataset fits within the scope of EBRAINS. The submission generates a curation ID allowing us to track the case.

Fill in the Ethics and Regulatory compliance form.

This form collects the necessary information needed for us to evaluate whether we can ethically and legally share the data via EBRAINS.

See below for information about the ethical and legal aspects concerning sharing of human subject data.

2. Upload data

Ensure data is structured consistently prior to upload.

We look for organized data, not organized according to our standard. This is to support the broadest degree of sharing possible. We do however require that the data is organized in a consistent and precise manner. Please see our guidelines on data organization for further guidance.

Upload data to EBRAINS Storage, either using a drag-and-drop solution (opt. 1) or an interactive python script (opt. 2).

Opt. 1. For smaller datasets with a reasonable amount of files, we recommend using the Collab-Bucket solution (drag-and-drop). A Collab Bucket must first be assigned to a dataset, which happens when a datasets is accepted for sharing.

Opt. 2. For larger datasets or datasets with a large amount of files, we recommend using a programmatic approach. The python script is interactive and does not require any additional programming.

EBRAINS offers secure, long-term storage at CSCS Swiss National Supercomputing Centre, with currently no upper limit of storage capacity.

If a data collection is already uploaded elsewhere, we may link to the already existing repository.

3. Submit metadata

Submit metadata using the EBRAINS Metadata Wizard (opt. 1), or directly via the Knowledge Graph (opt. 2)

Opt. 1. Manually submit the minimal required metadata via the EBRAINS Metadata Wizard. The minimal required metadata covers extended bibliographic information necessary to publish your dataset on EBRAINS. The submitted information, including uploaded files, will be sent to the Curation team automatically

Opt. 2. To go beyond the minimal required metadata, you can directly interact with the Knowledge Graph (KG) in your private space. Within the private space, you can upload metadata and interact with them, moreover you can connect your metadata to existing publicly accessible entries. Access to your private space is granted upon the initiation of the curation process. You can access your private space via:

Knowledge Graph Editor: This User Interface allows you to manually enter metadata into your KG space and validate metadata that are programmatically uploaded. The Editor contains a basic set of openMINDS metadata templates, but can be extended to the full openMINDS metadata model on request. Access is granted once the request is accepted.
Fairgraph: This is the recommended software tool for programmatic interaction with the KG. It allows you to programmatically upload openMINDS compliant metadata into your KG space and interact with existing metadata.
KG Core Python SDK: This python package gives you full freedom in interacting with he KG. It allows you to upload any JSON-LD with metadata into your private space. Note, for dataset publications in EBRAINS, the JSON-LD metadata files have to comply to openMINDS.

Datasets published through the EBRAINS Knowledge Graph have to be registered using openMINDS compliant metadata delivered as JSON-LD files. See this summary table for an overview of the minimally required openMINDS properties for publishing on EBRAINS.

4. Write a Data Descriptor

Write a data descriptor by filling in this template.

The Data Descriptor is a document helping others interpret and reuse (and prevent misuse) of your data, and is critical to achieve a basic level of FAIR. The document will be uploaded in the repository of the data, shared as a PDF.

See our infographic about the data descriptor for inspiration or guidance.

Check out previous examples in the KG Search. See e.g., the data descriptor for the dataset "Anterogradely labeled axonal projections from the orbitofrontal cortex in rat".

Journal publications sufficiently describing the shared data, such as made available through Nature Scientific Data, Elsevier Data in Brief, BMC Data note and more, can replace the EBRAINS Data Descriptor.

Download our infographic
about the EBRAINS Data
Descriptor

5. Preview and publish

Preview and approve the release of your dataset.

Once a Curator has assembled the dataset in the EBRAINS Knowledge Graph, combining the data, metadata and data descriptor, the data provider will receive a private URL for previewing the dataset prior to release. We need an official approval from the data custodian¹ to release the dataset. Once released, a DataCite DOI will be generated for the dataset. If the identical data collection has received a DOI elsewhere, we recommend re-using the already issued DOI.

Step by Step - Models

1. Start early

It is not necessary to wait until you are ready to publish to register your model with EBRAINS.

By registering a model early in your project, you can take advantage of EBRAINS tools
to keep track of simulations and to share them with your collaborators.

2. Create/choose a Collab workspace

We use EBRAINS Collaboratory "collab" workspaces to help manage the model curation process.

In particular, we use collab membership (the "Team") to control who can view or edit your model metadata prior to publication.

It is up to you whether you create a new collab for each model, or reuse an existing collab
(it is no problem to have multiple models associated with a single collab).

Collabs are also useful for storing simulation results, adding documentation for your model,
and/or providing tutorials in Jupyter notebooks.

3. Upload code

We recommend storing model code and/or configuration files in an online Git repository, for example on GitHub.
This repository should be public when you publish the model, but a private repository can be used for model development.

Alternatively, you can upload code to the Collab Drive or Bucket storage.

4.Submit metadata

We recommend submitting metadata using the Model Catalog app, installed in your collab.

To install it:

click the "+ Create" button
in the "Create Page" form, add a title, such as "Model Catalog", and select "Community App", then click "Create"
scroll down until you find the "Model Catalog" app, click "Select", then "Save & View"

You will then see a table of all the models and validation tests associated with this collab.
If this is your first time using the app, the table will probably be empty.
To add your model, click "+", fill in the form, then click "Add model".

As development of your model proceeds, you can easily register new versions of the code,
and new parameterizations, by clicking "Add new version".

If you prefer not to use the app, you can instead fill in the Curation request form.,
and you will be contacted by e-mail with further instructions.

5. Provide a reference dataset

Once you're ready to publish your model entry in the EBRAINS Knowledge Graph,
we encourage you to provide a dataset containing the simulation results produced by your model,
following the process under "Step by step - Data" above.

These reference data will be linked to the model, and will be helpful to anyone trying to
reuse your model.

We will soon introduce a "Reproducible" badge for all models that include a reference dataset,
and whose simulation results can be reproduced by an EBRAINS curator.

6. Request publication, preview and publish

Until you request your model entry to be published in the EBRAINS Knowledge Graph,
only members of the collab will be able to view the model entry, in the Model Catalog app
or using the Model Validation Python client.

After publication, the model will appear in the EBRAINS public search results, and will receive a DOI.

To request publication, contact EBRAINS support, providing the collab name and the model name or ID.

Curators will then perform a number of checks:

Does the model description provide sufficient context to understand the purpose and use of the model?
Does the code repository contain a licence file, explaining the conditions for reusing the code?
Does the model have a clearly defined version identifier (e.g. v1.0)? For models in a Git repository, the version identifier should match the name of a tag or release.

The curators will also take a snapshot of your model code.

For models in public Git repositories, we archive a copy of the repository in Software Heritage.
For models in a collab Bucket or Drive, we make a read-only copy of the code in a public container in the EBRAINS repository.

Once this is done, you will be invited to review a preview of how the model entry will appear in the KG Search,
and will have the opportunity to request modifications prior to approval and publication.

Step by Step - Software

Software curation at a glance

Fill in the request form. You'll be contacted by a curator with further instructions. In these instructions, you will find the links to your software and software version entries in the Knowledge Graph.
Enter the metadata of your software in your private space of the Knowledge Graph Editor by using the links provided by the curator. Please provide the metadata for your software as complete as possible. This makes it easier for users to find and use your software. If you have questions feel free to contact the curation support. After you have finished editing your entries, please let the curators know by replying to your ticket. We curate your metadata and get back to you, if necessary.
After a quality check, we integrate and publish the information to the Knowledge Graph. Your software is then searchable and usable for the neuroscience community.

If you want to add a new version to an already curated software, please request this via the curation request form.

For more information, visit our Guide to Software Curation in the EBRAINS Knowledge Graph or see our infographic on the right.

Sharing human subject data

Human subject data that can be shared on EBRAINS:

- Post-mortem data
- Aggregated data
- Strongly pseudonymized or de-identified subject data
with a legal basis for sharing (e.g. Informed Consent)

If you have human data that does not qualify as any of the above,
please get in touch and we will clarify the available options.

Human subject data shared on EBRAINS must comply with GDPR and EU directives. The information we need to assess this is collected via our Ethics and Regulatory Compliance Survey.

Post-mortem and aggregated human data can be shared openly, given direct identifiers in the metadata are removed. Strongly pseudonymized and de-identified data can be shared via the Human Data Gateway (HDG).

The Human Data Gateway (HDG) was introduced in February 2021 as a response to the needs of multiple data providers who are bringing human subject data to EBRAINS. HDG covers the sharing of strongly pseudonymized or de-identified data, a limited range human subject data without direct identifiers and with very few indirect identifiers.

The HDG adds an an authentication layer on top of the data. This means that data users must request access to the data (via their EBRAINS account) and will receive access provided they actively accept the EBRAINS Access Policy, the EBRAINS General Terms of Use, and the EBRAINS Data Use Agreement. The account holder also have to accept that information about their request and access to specific data under HDG is being tracked and stored. Data owners must be aware that sharing under the HDG affects the legal responsibilities for the data. They must agree to joint control of the data (see the Data Provision Protocol v1, section 1.4 - 1.5) and the Data Protection Officers of the responsible institutions must have accepted that the data can be shared under HDG.

The HDG is an extension of the existing services and does not replace the future EBRAINS Service for sensitive data (planned for 2024) which is outside the domain of the current EBRAINS Data and Knowledge services.

The openMINDS metadata framework

openMINDS (open Metadata Initiative for Neuroscience Data Structures) is a community-driven, open-source metadata framework for graph database systems, such as the EBRAINS Knowledge Graph. It is composed of linked metadata models, libraries of serviceable metadata instances, and supportive tooling (openMINDS Python, openMINDS Matlab). For exploring the openMINDS schemas, go to the HTML documentation. For a full overview of the framework, go to the openMINDS collab or the GitHub repository.

For feedback, requests, or contributions, please get in touch with the openMINDS development team via

the support-email: openminds@ebrains.eu
the GitHub issue tracker
the INCF NeuroStars openMINDS Community Forum

Add practical value to your shared data, model or software

Showcase shared data, models or software in other services

Below is a list of additional services that data, models or software shared via EBRAINS can benefit from. EBRAINS is continuously looking to increase the number of interoperable services.

Viewer for 2D images
	Integrate image data with the Mio viewer: EBRAINS Multi-Image OpenSeadragon viewer provides an intuitive way of navigating high-resolution 2D image series. It has browser-based classic pan and zoom capabilities. A collection can be displayed as a filmstrip (Filmstrip Mode) or as a table (Collection Mode) with adjustable number of row and columns. See Mio viewer links available for this dataset as an example. MioViewer user manual is found here.
Viewer for sequential atlas-registered 2D images with annotation options
	Integrate atlas-registered 2D image data with the LocaliZoom viewer: The EBRAINS LocaliZoom serial section viewer displays series of registered 2D section images with atlas overlay, allowing the users to zoom into high-resolution images and have information about the brain regions. See the LocaliZoom links available for this dataset as an example. LocaliZoom user manual is found here.
Interactive 3D atlas viewer with options for data visualization
	Upload your data to the Siibra-explorer: The siibra-explorer is used for visualizing volumetric brain data in all the brain atlases provided by EBRAINS (Human, Monkey, Rat and Mouse). The siibra-explorer viewer uses siibra-api to enable navigation of brain region hierarchies, maps in different coordinate spaces, and linked regional data features. Furthermore, it is connected with the siibra toolsuite providing several analytical workflows. To learn more about how to register your data to atlases, read about the Atlas services on ebrains.eu.
Use your research product in an interactive publication
	Add your data, models or software to a Live paper. Read more about Live papers on ebrains.eu.

Add a tutorial or learning resource

More information will follow

Create a workflow

More information will follow

General benefits of sharing data

By sharing your data via EBRAINS, you gain access to the following benefits:

We support you to better follow the FAIRguiding principles for data management and stewardship². Publishing data, models or code via EBRAINS will provide you with a citeable DataCite DOI for your research product.

Frequently asked questions

Is the curation process time consuming and difficult?

No, if communication is on a regular basis, we are able to finish curation within two weeks. Publishing your data naturally takes some effort but we will support you as much as possible.

Is sharing my data also beneficial for me or only for others?

When you publish your data via EBRAINS, we provide comprehensive data management support and safe long term storage - all free of charge. Additionally, your data can be cited, just like a scientific journal article. Sharing your data may even lead to new funding opportunities. Many funders specifically support projects that are part of the “Open Science” initiative.

Can my data be too insignificant to share?

No, there is no such thing as insignificant data. Data that is considered insignificant for a given topic, may have great significance for another. By making “insignificant” data publicly available, other researchers may find something interesting that was off-topic for your own purposes.

Can my data be easily misused if I share it?

No, your data will be covered by a Creative Commons license of your choice. There are a variety of licenses available, enabling you to prevent use for specific purposes, e.g. commercial use.

Can I share my data before my paper is published?

Yes, if you do not want to share your data before publishing the results in an article, you can publish your dataset with an embargo status. This will make it possible to find information about the data without making the data itself available, and give you a citeable DOI.

Can I lose my competitive edge if I share my data before I publish the associated paper?

No, publishing your data does not mean that others can use it however they want. Use of your data will require citation, and by choosing an appropriate Creative Commons licence you decide what others are allowed to do with it. If you still feel worried, you can publish your data under embargo, and in this way delay the date of data release, but still make it possible for others to find the information about the data.

The curation team: meet the curators

The EBRAINS curators help researchers publish their research using the EBRAINS Research Infrastructure. A curator’s job is similar to the job of an editor of a scientific journal, checking the data is organized, understandable, accessible and sufficiently described.

The curators in EBRAINS are located in Oslo, Jülich, Trier and Paris.

Located in Norway

My project2.jpg

Archana Golla

Curation Scientist
Neuroscience (PhD)
Behavioral neuroscience and microscopy

My project.jpg

Camilla H. Blixhavn

Curation Scientist,
Phd Student
Neuroscience (M. Sc.)
Neuroanatomy and data integration

My project (1).jpg

Ingrid Reiten

Curation Scientist,
Phd Student
Neuroscience (M. Sc.)
Neuroanatomy and structural connectivity

My project1.jpg

Sophia Pieschnik

Curation Scientist
Neurocognitive Psychology (M. Sc.)
Neuroimaging

My project.jpg

Heidi Kleven

Curation Scientist,
Phd Student
Neuroscience (M. Sc.)
Neuroanatomy and brain atlases

Located in Germany

My project (2).jpg

Jan Gündling

Curation Scientist,
Phd Student
Sensors and Cognitive Psychology (M. Sc.)
Human-Computer Interaction

Lyuba Zehl

Knowledge Systems Engineer
Dr. rer. nat. (Systems Neuroscience)
Standard development, data & knowledge management, interdisciplinary communication, data analysis

Contact

curation-support@ebrains.eu

Affiliated laboratories

Institute of Basic Medical Sciences, University of Oslo, Norway (PI: Jan G. Bjaalie, Trygve B. Leergaard)

Institute of Neuroscience and Medicine (INM-1), Research Centre Jülich, Germany (PI: Timo Dicksheid)

References

^ The Data Custodian is responsible for the content and quality of the Data and metadata, and is the person to be contacted by EBRAINS CS in case of any misconduct related to the Data. It is the obligation of a Data Custodian to keep EBRAINS informed about changes in the contact information of the authors of the Datasets provided by them (EBRAINS Data Provision Protocol - version 1.1).
^ Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18