Blog

Dockerised Data Science Stack (Part 3)

2020-08-27 | 2 minutes read

Tags: Data Science, Docker, OSS, JupyterHub, JupyterLab, code-server

This is the last of a three part series on how to set up a self-hosted, internet-facing, dockerised Data Science Stack.

Reverse proxy: Træfik
Git repository server: GitLab
- Including Mattermost
Data Science Hub: JupyterHub
- IDE: JupyterLab + code-server
- Languages: Julia, Python and R

See https://gitlab.b-data.ch/docker/deployments/jupyter on how to get JupyterHub up and running.

The default configuration uses an R-based JupyterLab Image. If you are more interested in Julia, simply change the environment variable DOCKER_JUPYTERLAB_IMAGE to glcr.b-data.ch/jupyterlab/julia/ver.

This deployment should also work with the Jupyter Docker Stacks, a set of ready-to-run Docker images containing Jupyter applications and interactive computing tools.

About code-server

Run VS Code on any machine anywhere and access it in the browser.

Highlights

Code everywhere

Code on your Chromebook, tablet, and laptop with a consistent development environment.

Develop on a Linux machine and pick up from any device with a web browser.

Server-powered

Take advantage of large cloud servers to speed up tests, compilations, downloads, and more.

Preserve battery life when you’re on the go as all intensive tasks runs on your server.

Make use of a spare computer you have lying around and turn it into a full development environment.

— cdr/code-server: VS Code in the browser

JupyterLab: Jupyter’s Next-Generation Notebook Interface

JupyterLab is a web-based interactive development environment for Jupyter notebooks, code, and data. JupyterLab is flexible: configure and arrange the user interface to support a wide range of workflows in data science, scientific computing, and machine learning. JupyterLab is extensible and modular: write plugins that add new components and integrate with existing ones.

— Project Jupyter | Home

What is JupyterHub?

JupyterHub brings the power of notebooks to groups of users. It gives users access to computational environments and resources without burdening the users with installation and maintenance tasks. Users - including students, researchers, and data scientists - can get their work done in their own workspaces on shared resources which can be managed efficiently by system administrators.

JupyterHub runs in the cloud or on your own hardware, and makes it possible to serve a pre-configured data science environment to any user in the world. It is customizable and scalable, and is suitable for small and large teams, academic courses, and large-scale infrastructure.

— Project Jupyter | JupyterHub