Warning: Dear service owners, OKD prod and dev instances at CSCS are being decommisioned (hard deadline is end of September). All services running in both prod and dev instances must be deleted before that time



Introduction

This page describes how to access data via the Human Data Gateway (HDG). The HDG service was installed to control access to sensitive human data which is shared in strongly pseudonymized form.

What datasets are available?

Datasets that are released in Knowledge Graph v2 or v3 are available. The KG's controlled access filter can be used to list datasets accessed via the HDG.

Access via the UI

In order to access those datasets, the following steps are involved:

  1. A user finds a dataset in the Knowledge Graph and logs in to the KG (or vice versa)
  2. The user requests access to the dataset from the Get Data section of the dataset card.
  3. The HDG service verifies that the user is authenticated.
  4. An e-mail is then sent to the user with the Data Use Agreement and a link.
  5. By clicking the link in the e-mail, the user accepts the terms and proves s/he has access to the e-mail. The same link can be used multiple times within a predefined period (currently 24 hours).
  6. Clicking the link directs the user to the HDG service which:
    1. verifies that the user is authenticated, and
    2. logs the consent to the DUA and the access to the data.
    3. redirects the user to a data-proxy UI interface providing access to the objects in the dataset
  7. The user can browse the dataset contents.

Programmatic access

Programmatic access to the HDG service was mainly designed to allow other services to consume controlled data on behalf of the end-user. For example, a user can browse a controlled brain scan. The image viewer will then programmatically access the HDG controlled data and the end-user will validate the access in her/his e-mail.

Access to datasets shared via the Human Data Gateway

You can now browse Knowledge Graph datasets through the data proxy as an authenticated user via the datasets API: https://data-proxy.ebrains.eu/api/docs#/datasets

To get access to datasets protected by the HDG, use the datasets/{dataset_id} endpoint.

  1. First perform a POST /datasets/{dataset_id} request to start the HDG flow.
  2. An e-mail is then sent to the user with the Data Use Agreement and a link. By clicking the link, users accept the DUA and prove they have access to the e-mail.
  3. Once the access has been validated, the user is redirected by default to the Knowledge Graph page of the dataset. This redirection can be customized for third-party integration.

Note: For publicly available datasets, the datasets/{dataset_id} endpoint can be used via GET, without the extra validation round.

Example of programmatic integration for a third party

  1. A user wants to access a KG dataset in your third-party application.
  2. The third-party application calls GET /datasets/{dataset_id} with the user token in order to get information about the dataset or download it.
    1. The dataset is not protected --> A successful response is sent.
    2. The dataset is protected --> An HTTP 401 response is sent back. In the 401 JSON response, a can_request_access field is specified. If true, it means that it is possible to request access to that dataset using POST /datasets/{dataset_id}.
  3. The third-party application asks the user if they want to request access, and if so then sends POST /datasets/{dataset_id}?redirect_uri=https://mythirdpartyapp.eu/{dataset_id}.
  4. If the request is successful, an e-mail is sent to the user.
  5. Once the user has clicked on the e-mail link, access is granted and the user is redirected to https://mythirdpartyapp.eu/{dataset_id}.
  6. GET /datasets/{dataset_id} now answers with successful responses.

 

Tags:
    
EBRAINS logo