Tutorial: Creating Repeatable, Reusable Experimentation Pipelines With Popper (Hands-on Tutorial)
Creating Repeatable, Reusable Experimentation Pipelines With Popper (Hands-on Tutorial)
Currently, approaches to scientific research require activities that take up much time but do not actually advance our scientific understanding. For example, researchers and their students spend countless hours reformatting data and writing code to attempt to reproduce previously published research. What if the scientific community could find a better way to create and publish our workflows, data, and models to minimize the amount of the time spent “reinventing the wheel”? Popper is a protocol and CLI tool for implementing scientific exploration pipelines following a DevOps approach. This allows researchers to generate work that is easy to reproduce.
Modern open source software development communities have created tools that make it easier to manage large codebases, allowing them to deal with high levels of complexity, not only in terms of managing code changes, but with the entire ecosystem that is needed in order to deliver changes to software in an agile, rapidly changing environment. These practices and tools are collectively referred to as DevOps. The Popper Experimentation Protocol repurposes the DevOps practice in the context of scientific explorations so that researchers can leverage existing tools and technologies to maintain and publish scientific analyses that are easy to reproduce.
In the first part of this short course, we will briefly introduce DevOps and give an overview of best practices. We will then show how these practices can be repurposed for carrying out scientific explorations and illustrate using some examples. The second part of the course will be devoted to hands-on experiences with the goal of walking the audience through the usage of the Popper CLI tool.
Outline of tutorial:
- Introduction to the practical aspects of reproducibility.
- Overview of best practices in DevOps / open source software projects.
- Illustrate how Popper repurposes DevOps to scientific explorations.
- Show examples of projects that follow the Popper protocol.
- Hands-on experiences with the Popper CLI tool.
Knowledge gap this course addresses
Traditional experimentation practices are deeply rooted in the muscle memory of researchers: typing commands in “live” systems and getting results as they go. Popper puts an emphasis on versioning, automation and portability of experimentation pipelines. In practice, this means writing scripts (instead of directly typing on the terminal or GUIs), and making use of automation (DevOps) tools to execute these scripts. By following the Popper protocol, researchers can create the habit of generating scientific explorations that are easy to replicate, and amenable to collaboration/extension.
- Unix environment (Linux, MacOS or Linux Subsystem for Windows 10+).
- Bash, Git/Github, Python, virtualenv and Docker. Familiarity with these tools will make it easier to follow the Popper lesson. In the case of Docker, familiarity with how container images are created is desired (but not strictly required).
- Popper CLI tool. Instructions on how to install can be found here. Quickstart guide can be found here.
- We can use an experimentation pipeline in your domain as the basis of our last hands-on exercise. This can be any analysis pipeline, parameter sweep for a simulation, or any other computational or data science study you would like to “popperize”.
Link to lesson: popperized.github.io/swc-lesson