R bootcamp, Module 0: Recruit processing

August 2022, UC Berkeley

Chris Paciorek

A few administrative things

Where to find stuff

This website and the associated GitHub site (https://github.com/berkeley-scf/r-bootcamp-fall-2022) is the main site for the bootcamp. It has information on logistics, software installation, and is the master repository for materials for the modules.

We have an Ed Discussion site for discussion and answering questions online during (and before) the bootcamp.

If you have an administrative question before or after the bootcamp, email r-bootcamp@lists.berkeley.edu.

Wireless

The campus WiFi is now eduroam, not AirBears. Follow these instructions for how to set up your eduroam account. If you need wireless access as a guest (i.e., you don’t have a CalNet ID), connect to ‘CalVisitor’.

How we’ll operate

The bootcamp will be organized in modules, each of which will be a combination of lecture/demo presentation concluded by a breakout session in which you’ll work on a variety of problems of different levels of difficulty. The idea is for each person to find problems that challenge them but are not too hard. Solutions to the breakout problems will be presented before the start of the next module.

Many of the modules will use a common dataset as an example on which to carry out various operations. We’ll focus on dataset of demographic/economic information (population, GDP per capita, life expectancy) for many of the countries in the world every five years, provided by the Gapminder project. (Note that this is almost the full population of countries – I’ll fit some statistical models but the interpretation is tricky as we are not working with a sample from a well-defined population.)

Getting help

Your counseloRs are: Alan Aw (Statistics), Florica Constantine (Statistics), Corrine Elliott (Statistics), and Natalia Sarabia Vasquez (Statistics).

Suggestions on how to get the most out of the bootcamp

I encourage you to:

This is a bootcamp. So there may be some pain involved! If you find yourself not following everything, that’s ok. You may miss some details, but try to follow the basics and the big picture.

A few additional thoughts on my pedagogical philosophy here:

RStudio and R Markdown

We’ll present most of the material from within RStudio, using R Markdown documents with embedded R code. R Markdown is an extension to the Markdown markup language that makes it easy to write HTML in a simple plain text format. This allows us to both run the R code directly as well as compile on-the-fly to an HTML file that can be used for presentation. All files will be available on GitHub.

Note: The files named moduleX_blah.html have individual slides, while the files named moduleX_blah_onepage.html have the same content but all on one page.

Warning: in some cases the processing of the R code in the R Markdown is screwy and the slides have error messages that do not occur if you just run the code directly in R or RStudio.

Using GitHub to get the documents

To download the files from GitHub, you can do the following.

Within RStudio

Within RStudio go to File->New Project->Version Control->Git and enter:

Then to update from the repository to get any changes we’ve made, you can select (from within RStudio): Tools->Version Control->Pull Branches

or from the Environment/History/Git window, click on the Git tab and then on the blue down arrow.

Be warned that you probably do not want to make your own notes or changes to the files we are providing. Because if you do, and you then do a “Git Pull” to update the materials, you’ll have to deal with the conflict between your local version and our version. You probably will want to make a personal copy of such files in another directory or by making copies of files with new names.

From a Mac/Linux terminal window

Run the following commands:

Then to update from the repository to get any changes we’ve made:

As a zip file

If you don’t want to bother using Git or have problems, simply download a zip file with all the material from https://github.com/berkeley-scf/r-bootcamp-fall-2022/archive/main.zip.

What is R?

Modes of using R

Starting R and RStudio

The pieces of an R session include:

RStudio provides an integrated development environment in which all of these pieces are in a single application and tightly integrated, with a built-in editor for your code/scripts.

Why R?

Why Not R?

What are my other options?

My hidden agenda

In addition to learning some R, this workshop will expose you to a way of thinking about doing your computational work.

The building blocks of scientific computing include:

Thanks!

We want your feedback (even if you leave early)!

During the afternoon break tomorrow, we’ll ask everyone to fill out a feedback form, but if you leave early, please see the Ed Discussion board for the link.