Data Proxy & Human Data Gateway

Last modified by alexisdurieux on 2022/03/25 08:38

Data Proxy & HDG

The Data Proxy is an application that allows authenticated EBRAINS users to access Swift Object storage without a Fenix user account.
This application has 2 main use cases:

It provides a dedicated optional swift container to every Collab. We call it the Collab Bucket.
It allows users to access and visualize Knowledge Graph datasets. It provides an additional layer to access datasets with more sensitive human data that has been strongly pseudonymized (e.g defaced brain scans): The Human Data Gateway

The data proxy core is the application that acts as a proxy to the object storage (Swift).

Object Storage

The documentation of Swift object storage can be found here:

https://docs.openstack.org/swift/pike/admin/objectstorage-intro.html

Authentication

The Data Proxy authenticates its users with the EBRAINS (Collaboratory) IAM service.

Prior to the availability of the Data Proxy, EBRAINS users had to request a Fenix user account in order to access object storage capabilities on the Fenix infrastructure. With the Data Proxy, a user only needs to have an EBRAINS account to access object storage resources (effectively on the same Fenix object storage infrastructure). Data stored in this way is held in the name of the data proxy service account on Fenix, and the Data Proxy tracks who has access to which data.

Permissions

We use the Collaboratory authorization system to manage permissions in the Data Proxy.

A Swift object container can be associated to each collab. Object containers are also known as "buckets" to avoid confusion with other containers (e.g. Docker containers). An EBRAINS user can perform the following actions on a bucket depending on the user's permissions (as defined by the collab's Team) in the collab associated with the bucket.

Team permissions of a collab	Available actions on that collab's bucket
Viewer	Read
Editor	Create, Read, Update, Delete
Admin	Create, Read, Update, Delete
Not a collab member	Read access only if the collab is public

You can access the buckets in the "Bucket" navigation element in every collab.

Collaboratory bucket vs drive

A collab offers 2 main locations to store files: a drive and a bucket. The drive offers more advanced features like recognition of file formats (Office, Markdown, PDF) with applications specific to each, simplified version control, smart links. The bucket on the other hand offers larger storage capacity and better bandwidth. The bucket is recommended for datasets (brain scans, EEG, derived data) and videos (including for streaming).

API

The API is self-documented using Swagger UI. You can access it here: https://data-proxy.ebrains.eu/api/docs or in the API Documentation wiki page of this collab.