How I Spent My Summer: Making Jupyter Tools – and Groundbreaking Science – Even Better
August 23, 2021
By Josh Geden, Intern, Data Science and Technology Department
Each summer, Berkeley Lab hosts dozens of college students through the Computing Sciences Summer Program, and this year I had the opportunity to work with the National Energy Research Scientific Computing Center (NERSC) and the Computational Research Division (CRD) on a variety of projects related to Jupyter. This effort was part of a collaboration between NERSC and CRD to enhance Jupyter tools at NERSC.
The core feature of Jupyter is the notebook, an interactive computational environment that combines text, code, data analytics, and visualization into a single document that can be easily reproduced and shared. These notebooks can be used to provide a more user-friendly way to interact with the Cori supercomputer, and are very popular among scientists who utilize NERSC resources. Recent metrics indicate that more than 30 percent of user interaction with Cori now goes through NERSC’s Jupyter deployment.
The beauty of Jupyter, from a developer's point of view, is its modular design and browser-based user interface (UI). Frontend extensions can be developed as simple typescript objects and integration with the main Jupyter installation is handled through the included Jupyter package manager. For extensions that require a backend, developers follow the same steps to create the frontend, and then they can add a simple web application with the Tornado Python library. And because Jupyter runs in the browser, anyone with an understanding of simple HTML and CSS can develop user-facing components with ease. Jupyter also uses Bootstrap in the frontend, which allows developers to reuse common class names to maintain a cohesive UI. This helps extensions blend in and feel like a true part of the Jupyter extension.
At the beginning of my internship, I focused on developing two JupyterLab extensions. JupyterLab is the most recent notebook interface, and if you log onto https://jupyter.nersc.gov, that will be the default option for how notebooks are displayed. The first extension I developed adds a new tab to the help menu that provides links to external documentation. These links can be configured by an admin by including a simple JSON entry in a Jupyter configuration file, which will set the links for every user. The second extension I developed adds the ability to have announcements within JupyterLab. This extension will periodically poll an external API for any announcements, and if the API returns a non-empty JSON object then it will display a button on the JupyterLab status bar. Users can then click on this button to open a window with the announcement text.
The next project I focused on was something we call the JupyterHub Entrypoint Service. My first two projects were a good warmup for the Entrypoint Service, as it proved to be my largest project and the main focus of my internship. A common issue that users at NERSC were running into was ensuring that package dependencies and versioning were consistent across multiple notebooks and user profiles. This is particularly important for certain multi-user projects such as the Advanced Light Source and the Rubin Observatory Legacy Survey of Space and Time Dark Energy Science Collaboration, both of which have very specific package requirements that every user has to use for their analysis. The good news was that this problem is easily solved by using virtual package environments such as conda environments or even container images. The bad news was that there wasn’t an easy way for users to launch their Jupyter notebooks in these environments through JupyterHub.
JupyterHub is a part of the Jupyter environment that enables spawning Jupyter notebooks on remote servers. When you go to https://jupyter.nersc.gov, the page where you land is the hub. At NERSC we use the hub to control which type of node on Cori a notebook gets launched or to modify the settings of that launch. My Entrypoint Service is an extension of JupyterHub that adds a registry of custom environments that users wish to launch their notebooks in. Users can view a simple UI that allows them to add custom environments such as conda environments and custom startup scripts by path, and it works with NERSC’s Shifter API to provide users with a list of available shifter images. Users can then select one of these environments to be their selected custom entry point. These settings are then available to the hub through an API. Once the user has an entry point selected, they can go back to the main page on the hub and launch their environment. At this point, the hub will query the Entrypoint Service API and if there is an entry point set it will modify the command that controls how the notebook launches.
The emphasis of my internship on programming may be a little surprising for other interns. When attending intern events and the concluding poster session, every other intern I met had research-type projects. But to support these researchers, NERSC and Berkeley Lab have to have a dedicated team of developers that are there to implement the features necessary for scientific study.
I think that will be the biggest takeaway from my internship: that science is a team sport. While I may not be as gifted at physics or chemistry, I was still able to support the researchers in these fields by developing tools that improve the usability of NERSC’s supercomputing resources, which in turn makes their jobs of conducting groundbreaking research a little bit easier.
Josh Geden currently attends Duke University and spent this summer working as an intern in NERSC’s Data & Analytics Services group and CRD’s Data Science and Technology Department under the mentorship of Shreyas Cholia and Rollin Thomas.