JupyterHub Deployment Proposal

Last modified by adietz on 2019/07/18 16:38

Draft

This document will address an initial release of jupyterlabs within the Collaboratory 2.0. In order to provide something quickly, this release will be limited in scope. Long term planning can be discussed here.

JupyterHub Deployment Proposal

This page describes the proposed JuptyerHub setup that will be offered as part of the "scientific release" of the Collaboratory 2.0 planned for Q3 2019.

The initial release should allow users to run notebooks in the Collaboratory 2.0 environment. The following features should be available from the start:

  • integration with the drive should be available, providing access to notebooks and data which are stored there and accessible by the user;
  • integration with the user's IAM account and OAuth tokens;
  • example notebooks to showcase some possibilities;
  • notebook versioning (limited);
  • preinstalled scientific and neuroscience libraries in a notebook container image; (libraries TBD).

The goal is to allow users to create new notebooks on the Collaboratory 2.0 platform and to migrate some, but not all, notebooks from the old Collaboratory to the new platform. Migrating notebooks might require some changes, especially with regards to the storage, but also with regards to the libraries used to access certain services and the access of services themselves.

The new JupyterHub deployment will not be fully compatible with the old environment. The storage model will change significantly. Certain libraries will no longer be available. Python 2 support needs to be evaluated (see the note at the bottom).

Architecture

  • Based on Kubernetes
  • Seafile storage mounted in user containers
  • Possibility of mounting large documents RO in certain images
  • Possibility of selecting from different images
  • Role based access to resources (quotas based on users'  access level)

Components

  • Jupyterhub: 1.x
    • Spawner: KubeSpawner
    • Authenticator: OAuthenticator
  • Notebook image:
    • Main notebook image: IPython: 7.x

Features

  • TBD: requirements management (how to manage python libraries and other dependencies).
  • TBD: Versioning of notebooks. Initially, this will only be through the versioning available in seafile.
  • TBD: Workflows

Notes

Python 2 support

Python 2 support was dropped in ipython 6.x+. The main image will no longer support Python 2.

Python 2's is deprecated and will reach end of life on January 1st, 2020. 

Many maintainers of widely used packages (numpy, pandas, ipython, etc)  have pledged to drop support for Python 2.

What are the needs from HBP for Python 2 support, and how long does it need to be present?