Automatically archiving reproducible studies with Docker

Sign in to queue

Description

useR!2017: Automatically archiving reproducible stu...

Keywords: Docker, Reproducible Research, Open Science
Webpage: https://github.com/o2r-project/containerit/
Reproducibility of computations is crucial in an era where data is born digital and analysed algorithmically. Most studies however only publish the results, often with figures as important interpreted outputs. But where do these figures come from? Scholarly articles must provide not only a description of the work but be accompanied by data and software. R offers excellent tools to create reproducible works, i.e. Sweave and RMarkdown. Several approaches to capture the workspace environment in R have been made, working around CRAN's deliberate choice not to provide explicit versioning of packages and their dependencies. They preserve a collection of packages locally (packrat, pkgsnap, switchr/GRANBase) or remotely (MRAN timemachine/checkpoint), or install specific versions from CRAN or source (requireGitHub, devtools). Installers for old versions of R are archived on CRAN. A user can manually re-create a specific environment, but this is a cumbersome task.
We introduce a new possibility to preserve a runtime environment including both, packages and R, by adding an abstraction layer in the form of a container, which can execute a script or run an interactive session. The package containeRit automatically creates such containers based on Docker. Docker is a solution for packaging an application and its dependencies, but shows to be useful in the context of reproducible research (Boettiger 2015). The package creates a container manifest, the Dockerfile, which is usually written by hand, from sessionInfo(), R scripts, or RMarkdown documents. The Dockerfiles use the Rocker community images as base images. Docker can build an executable image from a Dockerfile. The image is executable anywhere a Docker runtime is present. containeRit uses harbor for building images and running containers, and sysreqs for installing system dependencies of R packages. Before the planned CRAN release we want to share our work, discuss open challenges such as handling linked libraries (see discussion on geospatial libraries in Rocker), and welcome community feedback.
containeRit is developed within the DFG-funded project Opening Reproducible Research to support the creation of Executable Research Compendia (ERC) (Nüst et al. 2017).
References Boettiger, Carl. 2015. "An Introduction to Docker for Reproducible Research, with Examples from the R Environment." ACM SIGOPS Operating Systems Review 49 (January): 71–79. doi:10.1145/2723872.2723882.

Nüst, Daniel, Markus Konkol, Edzer Pebesma, Christian Kray, Marc Schutzeichel, Holger Przibytzin, and Jörg Lorenz. 2017. "Opening the Publication Process with Executable Research Compendia." D-Lib Magazine 23 (January). doi:10.1045/january2017-nuest.

Day:

2

Embed

Download

Download this episode

The Discussion

Add Your 2 Cents