This lesson continues with the second workshop on reproducible science, focusing on additional open source tools for researchers and data scientists, such as the R programming language for data science, as well as associated tools like RStudio and R Markdown. Additionally, users are introduced to Python and iPython notebooks, Google Colab, and are given hands-on tutorials on how to create a Binder environment, as well as various containers in Docker and Singularity.
This lecture covers the benefits and difficulties involved when re-using open datasets, and how metadata is important to the process.
This lesson provides a quick tour of some data repositories and how to download and manipulate data from them.