Using DataHub as a commmon computational environment

While in principle, it’s fine to use your laptop, it will be hard for us to diagnose all the different problems that might arise with so many participants. So we recommend using a common computational environment during the workshop. We’ll use the Berkeley DataHub for this. You can think of DataHub as providing user-specific virtual machines with identical software for all of us to use.

The exception is that if you already have experience using your laptop with the shell/terminal, Git (and pushing to GitHub) and Python.

Click here to access your own machine/server with Python, git, and terminal access on the campus DataHub (also used for various data science, CS, and statistics courses). (Do NOT go directly to datahub.berkeley.edu the first time you are accessing DataHub. If you do, you won’t have the repository materials available in your virtual machine.)

That will get you onto your server with the materials for the course, from the berkeley-scf/compute-skills-2024 repository available on the server.

Check your environment

To check things look good:

  1. Click on Terminal (in the 3rd row of the Launcher tab, under Other). (If you have trouble finding the Terminal, go to File -> New -> Terminal.)
  2. Check that your working directory is ~/compute-skills-2024 (by looking at the terminal prompt or running pwd)
  3. Run ls to see the files in the repository.

Editing files

You can combine text and code in a notebook, which you can start with:

  • New Launcher -> Notebook -> Python 3 (ipykernel), or
  • File -> New -> Notebook

To run the code in a cell, a shortcut is Ctrl + Enter/Return.

To create a script file containing just code, do File -> New -> Python file.

Stopping your server

You can stop your server via File-> Hub Control Panel -> Stop My Server.

If you only log out, any ongoing computations will still run in the background.

Accessing GitHub repositories from DataHub

(These instructions are needed only for the main sessions of the workshop, Thursday and Friday.)

Please don’t do the steps in this section until after we do the exercise where you create your newton-practice repository.

Authenticating with GitHub can be a bit tricky, particularly when using DataHub (i.e., JupyterHub).

Accessing GitHub from your laptop rather than DataHub

If you’re on your laptop and already set up to use GitHub, you don’t need to do the steps in this section, but if you’re using your laptop and not set up to use GitHub, these instructions should work there too. (it may also depend on the version of git you are using; e.g., version 2.34.1 is too old but 2.39 should be fine). You’ll need to install gh-scoped-creds via pip install.

That said, if you haven’t already used Git on your laptop and pushed changes up to GitHub, it’s best if you work in DataHub so that we don’t have to troubleshoot laptop issues for multiple participants.

We’ll use a tool (an ‘app’) that helps us with this, provided in the gh_scoped_creds Python package.

You must do all three steps in order to be able to interact with your GitHub repository from DataHub.

1. Configure git

First configure git in the terminal:

git config --global user.name "Your Name"
git config --global user.email "your_name@berkeley.edu"

git config --global color.ui "auto"

git config --global pull.rebase false

That will modify your ~/.gitconfig file.

2. Give DataHub access to your GitHub account

Then run this in the terminal:

gh-scoped-creds

Or you could instead run this in IPython or Jupyter Notebook:

import gh_scoped_creds
%ghscopedcreds

When you do that you’ll see a link and a code. Go to the link in a browser (on your laptop, outside of DataHub), login to GitHub (or click Continue if already logged in), input the code, and click Authorize Berkeley-DataHub-Git-Access. This will grant access to GitHub from DataHub (i.e., JupyterHub) for 8 hours (or until you stop your JupyterHub server).

3. Give access to the specific GitHub repository you are working with

Then (only the first time you are doing all this) go to the URL that is printed out in your terminal (which should be https://github.com/apps/berkeley-datahub-git-access and click on Install (and then if you have access to multiple GitHub organizations, choose your personal org, namely your GitHub username). Choose Only select repositories and select the repository you are using for the workshop, namely newton-practice. Then click Install.

Warning

If you run git push and get a permission denied or 403 error or Git asks for a password, something has gone wrong with one of the three steps above.