Binderising R projects with holepunch

What is mybinder?

In a nutshell, mybinder provides a way for taking the static contents of a repository on, for example, GitHub and launching them in an interactive environment in the browser!

Figure credit: Juliette Taka, Logilab and the OpenDreamKit project

To do so, repository owners add configuration files according the type of environment they wish to create which software called repo2Docker can use to create the specified computational environments. Here’s the full list of configuration files that docker2repo recognises.

To avoid confusion, let’s disambiguate some terms before moving on:

Disambiguation

In this section, there are some related terms, which will be outlined here for clarity:

  • Project Binder: An open community that makes it possible to create sharable, interactive, reproducible environments. The technological output of this project is a BinderHub.

  • BinderHub: A cloud-based infrastructure for generating Binders. The most widely-used is mybinder.org, which is maintained by the Project Binder team. It is built upon a range of open source tools, including JupyterHub, for providing cloud compute resources to users via a browser; and repo2docker, for building docker images from projects. Since it is an open project, it is possible to create other BinderHubs which can support more specialised configurations. One such configuration could include authentication to enable private repositories to be shared amongst close collaborators.

  • A Binder: A sharable version of a project that can be viewed and interacted within a reproducible computational environment running in the cloud via a web browser. By automating the installation of the computing environment (as discussed in the Reproducible Environments chapter), Project Binder transforms the overhead of sharing such an environment into the act of sharing a URL.

  • mybinder.org: A public and free BinderHub. Because it is public, you should not use it if your project requires any personal or sensitive information (such as passwords).

  • Binderize: The process of making a Binder from a project.

Disambiguation taken for the Turing Way chapter on Binder.

What is holepunch

holepunch is an R package by Karthik Ram that helps automate the process of binderising an R project. It provides functions for creating the configuration files required to binderise and R project.

Workshop materials

Today we’ll be working with a subset of materials from the published compendium of code, data, and author’s manuscript:

Carl Boettiger. (2018, April 17). cboettig/noise-phenomena: Supplement to: “From noise to knowledge: how randomness generates novel phenomena and reveals information” (Version revision-2). Zenodo. http://doi.org/10.5281/zenodo.1219780

accompanying the publication:

Carl Boettiger . From noise to knowledge: how randomness generates novel phenomena and reveals information. Published in Ecology Letters, 22 May 2018 https://doi.org/10.1111/ele.13085

The materials have been compiled into a research compendium containing all data and code associated with the referenced publication, as well as a paper written in Rmarkdown which can be knitted to pdf, ready for submission to Ecology Letters.

The materials are available on GitHub as a template repository.

Copy compendium template (GitHub)

To access the materials, first head to:

https://github.com/r-rse/holepunch-compendium-tmpl

Next, click on Use this template > Create a new repository

Complete new repo details (GitHub)

Next, complete the details for your copy of the repository:

  • Repository name: holepunch-compendium

  • Description: Binderised Research Compendium in R created using holepunch

Finally, click on Create repository from template to create your repo.

Clone repo

Once you’ve created your repo on GitHub, you’ll need to clone it to work with it locally.

Copy URL (GitHub)

First click on Code and click on the clipboard icon on the HTTPS panel to copy the URL.

Next let’s move to RStudio.

Create New RStudio Project (RStudio)

We’ll start by creating a new project using:

File > New Project

This opens up the New Project Wizard. Select Version Control as the source for our new project.

Next select Git:

Finally:

  • Paste the URL of the repo you copied form GitHub in Repository URL.

  • Select a destination for the cloned repo. I’m choosing to clone it to my Desktop.

Click on Create Project which will clone the repository from GitHub and launch the new project in RStudio.

Let’s use the dir_tree() function in package fs to examine the contents of the repository. The main files of interest are highlighted:

fs::dir_tree(recurse = TRUE)
.
├── CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── LICENSE.md
├── R
│   └── process-data.R <- R function used in paper
├── README.Rmd
├── README.md
├── analysis
│   ├── data
│   │   ├── DO-NOT-EDIT-ANY-FILES-IN-HERE-BY-HAND
│   │   └── raw_data
│   │       └── gillespie.csv <- raw data
│   ├── paper
│   │   ├── elsarticle.cls <- template for Latex styling of the article
│   │   ├── mybibfile.bib
│   │   ├── numcompress.sty
│   │   ├── paper.Rmd <- The paper written in Rmarkdown
│   │   ├── paper.fff
│   │   ├── paper.pdf 
│   │   ├── paper.spl
│   │   ├── paper.tex
│   │   ├── paper_files
│   │   │   └── figure-latex
│   │   │       └── figure1-1.pdf
│   │   └── refs.bib <- The bibliography used for references in the paper 
│   └── templates
│       ├── journal-of-archaeological-science.csl
│       ├── template.Rmd
│       └── template.docx
└── holepunch-compendium-tmpl.Rproj

Binderise our project

Before we can binderise our project, let’s first install holepunch

Install holepunch

remotes::install_github("karthik/holepunch")
Using github PAT from envvar GITHUB_PAT
Skipping install of 'holepunch' from a github remote, the SHA1 (031fd961) has not changed since last install.
  Use `force = TRUE` to force installation

I’m going to show you two ways to binderise your projects today, the standard way and the R compendium way.

Let’s start with the standard way to configure your R project, as recommended by the binder project.

To show you the two approaches, I’m going to prepare each approach in a separate git branch. At the end, you can choose which approach you prefer and we’ll merge that into our main branch. You will however have a lasting record of the two approaches in the two separate branches.

Binderise project with install.R and runtime.txt

To show you the standard approach we’ll be working in a a branch called install-r. So let’s go ahead and create it.

Create install-r branch (RStudio)

In the RStudio git panel, click on the new branch button which launches a new branch pop up.

Give the new branch the name install-r and click on Create.

You should now be in the install-r branch.

Create install.R file (RStudio)

The next stage is to create an install.R file. This is an R script that is run when building our computational environment and should contain code to install all necessary packages.

You could this manually but holepunch provides the nifty function write_install() that not only creates the file but also scans the project for dependencies and populates the file with the code to install them!

✔ Writing './.binder/install.R'

.binder
└── install.R

The function creates and writes the file to a hidden .binder directory.

If you open the file, you will see a character vector of all the packages required to reproduce our paper was created and wrapped in the install.packages() function.

Create runtime.txt file (RStudio)

The runtime.txt is effectively the file that freezes the versions of R and any packages installed in time.

By adding r-<YYYY>-<MM>-<DD> we are effectively telling binderhub that we want an R environment created with the version of R and any packages specified in install.R that were current on the date specified installed. We can also request a specific version of R with r-<version>-<YYYY>-<MM>-<DD>.

  • For dates prior to 2022-01-01, R and packages are installed from snapshots on MRAN. However, MRAN will be shutting down however in July.

  • If we request R 4.1 or later, or specify a snapshot date newer than 2022-01-01, packagemanager.rstudio.com is used to provide much faster installations via binary packages.

Holepunch provides another nice function, write_runtime(), for creating the file and also configuring it with today’s date.

.binder
├── install.R
└── runtime.txt

Create postBuild file (RStudio)

There’s one last thing we need to take care of. In my tests I found that, for some reason, there was a problem with Tex installation in the binder environment that was preventing successful rendering of our paper to pdf.

Re-installing TinyTex which is used by the rticles package through package bookdown seemed to fix this issue.

It would be nice if somebody launching our binder did not have to do this manually (and would save us having to document the step also!)

Luckily there is another configuration file available to us we can use, postBuild!

There isn’t a holepunch function for this so let’s create it manually in the .binder/ directory.

file.create(".binder/postBuild")
.binder
├── install.R
├── postBuild
└── runtime.txt

postBuild (with no file extension) is a script that can contain arbitrary commands to be run after the whole repository has been built. To make it a shell script we use #!/bin/bash at the top. Then we tell it to run the R command tinytex::install_tinytex(force = TRUE) in the next line.

#!/bin/bash
R -e "tinytex::install_tinytex(force = TRUE)"

Your file should look like this:

Generate badge!

Now it’s time to generate our badge to include in our README!

We use function generate_badge() and set the branch argument to install-r.

generate_badge(branch = "install-r")
✔ Setting active project to '/Users/Anna/Desktop/holepunch-compendium'
• Copy and paste the following lines into 'README.Rmd':
  <!-- badges: start -->
  [![Launch Rstudio Binder](http://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/annakrystalli/holepunch-compendium/install-r?urlpath=rstudio)
  <!-- badges: end -->
  [Copied to clipboard]

Once generated, the badge is now on our clickboard, ready to be pasted in our README. So let’s:

  • Open README.Rmd.

  • Paste the badge at the top of the file (below the YAML header).

  • Knit the document to create README.md.

Commit Changes (RStudio)

Let’s now commit everything and push to Github!

Click badge, generate binder (GitHub)

Let’s head back to our repo and specifically the install-r branch:

Once on the correct branch, scroll down to the README and click the binder badge

That’s it!

This now takes you to mybinder.org which will start preparing an interactive instance of your repository!

What’s going on in the background?

  • Binder is using the configuration files provided and software called repo2Docker to create a Dockerfile, a recipe for your computational environment.

  • Binder is using the Dockerfile to create a binder image, with all the software described in your configuration file installed. This takes some time but, once completed, binder will save the image so the next time you re-launch, this stage will be a lot faster.

  • Binder will then use the image to launch an interactive container on JupyterHub in which we can interact with the code and data in our repo!

Illustration about BinderHub architecture. The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807

Once our container is ready, binder launches our repo in an instance of Rstudio Server. We can can now reproduce our paper in Binder!

So let’s head to analysis/paper/paper.Rmd and open that file. At the top of the left hand panel, click on Knit. (and click yes to update the rmarkdown package if asked)

If everything went well, you should now see a rendered pdf of our paper! Reproducibility accomplished!

Binderise with Description & Dockerfiles

Another way we can binderise our projects using holepunch is by:

  • Managing dependencies through a DESCRIPTION file.

  • Creating our environment using our own Dockerfile.

This method is tied to setting up your project as a research compendium.

If you are unfamiliar with the idea of research compendia, I highly recommend reading this paper by Marwick et al:

Marwick B, Boettiger C, Mullen L. 2018. Packaging data analytical work reproducibly using R (and friends) PeerJ Preprints 6:e3192v2 https://doi.org/10.7287/peerj.preprints.3192v2

Also check out package rrtools that can help you package your research projects as research compendia as well the following tutorial which is how the materials we are working with were initially put together as a research compendium, using rrtools.

Create docker branch (RStudio)

Let’s go back to RStudio.

We’ll again work on a separate branch. We want to branch off main again so first let’s go back to main by checking it out in the RStudio git panel:

You should now be in the main branch.

Next let’s follow the same

Create compendium DESCRIPTION file (RStudio)

The DESCRIPTION file is a key file in R packages containing metadata. One of these metadata is also a list of package imports. We can therefore use this facility to list our dependencies in a DESCRIPTION file.

holepunch has another function for creating an appropriate DESCRIPTION file for our compendium. It will again scan our repository for dependencies and list them in the Depends: section.

The description file effectively allows us to turn our compendium into a package. So we need to give it a package name and description.

write_compendium_description(package = "holepunchCompendium", 
                             description = "Binderised Research Compendium in R created using holepunch")
Finding R package dependencies ... Done!
✔ Writing './DESCRIPTION'
ℹ Please update the description fields, particularly the title, description and author

A DESCRIPTION file is created in the root of our directory. Let’s open it up.

There’s some additional fields to fill in. Go ahead and complete them for posterity.

Here’s what my finished version looks like:

Type: Compendium
Package: holepunchCompendium
Title: Binderised Research Compendium in R created using holepunch
Version: 0.0.1
Authors@R: 
    person("Anna", "Krystalli", , "annakrystalli@googlemail.com", role = c("aut", "cre"))
Description: Binderised Research Compendium in R created using holepunch
License: MIT
Depends: 
    dplyr,
    ggplot2,
    ggthemes,
    here,
    knitr,
    readr,
    rmarkdown,
    rticles
Encoding: UTF-8
LazyData: true

Feel free to copy and edit. Just make sure it still remains a valid YAML file format (i.e. errors in indentation can make the file unreadable!

repo2docker can now use this file to install our dependencies. To pin the versions of our dependencies we could just include another runtime.txt and be done.

I want to show you a different approach offered by holepunch so you are aware of it.

Create compendium Dockerfile (RStudio)

In the first approach to binderise our project, repo2docker created a Dockerfile in the background and used it to build us a computational environent.

We can however by pass that step by providing our own Dockerfile and holepunch has function write_dockerfile() for doing this.

We need to provide it with the name of the Dockerfile maintainer and again we specify the branch we are working in:

write_dockerfile(maintainer = "Anna Krystalli", branch = "docker") 
✔ Setting active project to '/Users/Anna/Desktop/holepunch-compendium'
[1] TRUE
→ Setting R version to 4.2.2
→ Locking packages down at 2023-02-09
✔ Dockerfile generated at ./.binder/Dockerfile
Warning message:
In write_dockerfile(maintainer = "Anna Krystalli", branch = "docker") :
  The version of R matching the last modified file in this project is 4.2.2 and is 
the one being used in your Dockerfile. However, you are running 4.2.1 locally.
Assuming your code runs without errors, it might be ok to leave the Dockerfile at 
4.2.2. But if you wish to stick to your local version, you can re-run this 
function with a fixed date using r_date.

We’re getting a warning that the version of R pinned to the Dockerfile by today’s date is later than the one I have installed. I could correct this if I thought it would be a problem by specifying an earlier date using the r_date argument. But I know it’s not an issue so the version current for today’s date will work fine.

Let’s have a look at the contents of the file:

FROM rocker/binder:4.2.2
LABEL maintainer='Anna Krystalli'
COPY --chown=${NB_USER} . ${HOME}
USER ${NB_USER}



RUN wget https://github.com/annakrystalli/holepunch-compendium/raw/docker/DESCRIPTION && R -e "options(repos = list(CRAN = 'http://mran.revolutionanalytics.com/snapshot/2023-02-12/')); devtools::install_deps()"

RUN rm DESCRIPTION.1; exit 0

What this is effectively doing is:

  • pulling the latest binder image (version 4.2.2) maintained by the rocker project from dockerhub.

  • Copying the contents of our repository into the container and giving appropriate rights

  • Setting the user to NB_USER

  • Following that is fetching the DESCPRIPTION file from our repo

  • It is then running an R command to install the dependencies listed in our DESCRIPTION files with the command devtools::install_deps().

  • Note that the versions are pinned to today’s date through setting the repos option to http://mran.revolutionanalytics.com/snapshot/2023-02-09/ through command options(repos = list(CRAN = 'http://mran.revolutionanalytics.com/snapshot/2023-02-09/'))

Edit Dockerfile

There are a couple amendments to the Dockerfile we need to make:

FROM rocker/binder:4.2.0
LABEL maintainer='Anna Krystalli'
COPY --chown=${NB_USER} . ${HOME}
USER ${NB_USER}



RUN wget https://github.com/annakrystalli/holepunch-compendium/raw/docker/DESCRIPTION && R -e "options(repos = list(CRAN = 'http://mran.revolutionanalytics.com/snapshot/2023-02-09/')); devtools::install_deps(); tinytex::install_tinytex(force = TRUE)"

RUN rm DESCRIPTION.1; exit 0

Generate badge (RStudio)

Just like before let’s generate our badge, this time for branch docker.

generate_badge(branch = "docker")
• Copy and paste the following lines into 'README.Rmd':
  <!-- badges: start -->
  [![Launch Rstudio Binder](http://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/annakrystalli/holepunch-compendium/docker?urlpath=rstudio)
  <!-- badges: end -->
  [Copied to clipboard]

Just like before:

  • Open README.Rmd.

  • Paste the badge at the top of the file (below the YAML header).

  • Knit the document to create README.md.

Commit all changes

Again commit all changes and make sure to PUSH TO GITHUB.

Build Binder

Another convenience function offered by holepunch is build_binder()

This function builds binder in the background and once an image is ready, will open the Binder URL

Let’s go ahead and use it:

ℹ Your Binder is being built in the background. Once built, your browser will automatically launch. You can also click the binder badge on your README at any time.
Hey Anna, welcome back, time to Rrrrrrrrock!
Hey Anna, welcome back, time to Rrrrrrrrock!
Hey Anna, welcome back, time to Rrrrrrrrock!

Reproduce paper in mybinder

As we did before with the install-r approach, let’s reproduce our paper:

  • open analysis/paper/paper.Rmd

  • Knit!

You should be again looking at a successfuly rendered pdf version of the paper!

Merge your favourite approach into main

We’ve played around in two separate branches. Let’s now merge our preferred approach into the main branch.

I recommend merging the install-r branch.

That’s because of the current limitations of the Dockerfile approach:

  • the problem with the latest rocker/binder image it is based on.

  • the fact that holepunch still uses MRAN snapshots in it’s Dockerfile to pin versions. As mentioned this will be obsolete in July 2023. I’ve opened an issue in the package so hopefully this will be fixed soon.

  • Overall I just wanted you to be aware of this approach in case you are well versed in Docker and would prefer to configure your projects with Dockerfiles.

Merge install-r into main (GitHub)

  • Go to the install-r branch and click Contribute

  • Open a pull request

  • Merge pull request

Pull main branch locally (RStudio)

  • Checkout the main branch.

  • Pull

Correct badge

Change the URL in the badge in the README.Rmd to point to the main branch:

[![Launch Rstudio Binder](http://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/annakrystalli/holepunch-compendium/main?urlpath=rstudio)

Re-render README.Rmd, commit changes and push!

You’re repo is now all set up!