Project Structure for Reproducible Work

Learn to structure your project files to avoid chaos and get reliable results.

This course teaches you how to structure your computational projects in a way that both enables replicating results and helps with keeping complexity under control.

Besides getting a good general template for organizing files, you will also learn what factors affect the evolution of project files, and how to account for these. This will enable you to adapt the material to a variety of projects and operating procedures.

For the different parts of the project, the course gives practical advice and standard practices.
And the examples are about making pizza!

Course Contents

  7 sections
  44 min
  English
* In local currency.
Payment and access to videos is done using Vimeo. Separate Terms of Service and Privacy Policy apply.
Reproducibility Principles The factors that affect reproducibility and how these relate to project structure 5m
General Project Structure A simple, generic project structure you can start adapting to your work 5m 30s
Sub-structure: Data Details about managing data 5 min
Sub-structure: Code Details about managing code, including packaging 8m 30s
Sub-structure: Output Details about managing the output (results) of a project 11m
Sub-structure: Doc Details about managing project documentation 3m
Tracking Changes and Syncing Good practices for version control related to project structure, and a simple workflow to reliably sync project copies 6m

Target Audience and Course Requirements

The course was created with early-career researchers in mind (especially those that don’t have a formal computational background) regularly working with code and computational projects, like in e.g. bioinformatics, data processing and analysis pipelines, machine learning applications, simulation, etc.

The material is however designed to be cross-disciplinary, and is also relevant for beginner–to–intermediate level programmers in general, that are getting started with more complex projects, in areas like data engineering and analysis, machine learning, or when doing intricate benchmarking, implementing new algorithms, and basically whatever involves running multiple experiments with your software.

In terms of requirements, you should:

  • be comfortable with a programming language (no expertise needed, but when discussing code, it’s good if you at least are proficient enough to use functions)
  • know the basics of running commands from the terminal (optional)

The course uses Python for illustration but most of the material is valid for other programming languages.


About the Instructor

Filip has been programming for 25 years and has abundant experience with both professional and scientific research software development. He has also played a range of roles, from high-level design, modeling, analysis, to building large systems from scratch, as well as handling project coordination and customer support.
Areas he has worked in include machine learning, big data processing, computational biology, bioinformatics, mathematical modeling, algorithm design, simulation and analysis, software verification, scheduling and optimization, and database design.
He is a strong proponent of Agile methodologies for both industry and science, and he has a pragmatic, quality‑oriented philosophy to software craftsmanship.




Page last updated August 14, 2024