DataLad is a versatile data management and data publication multi-tool. In this session, you can learn the basic concepts and commands for version control and reproducible data analysis. You’ll get to see, create, and install DataLad datasets of many shapes and sizes, master local version workflows and provenance-captured analysis-execution, and you will get ideas for your next data analysis project.
This lesson continues with the second workshop on reproducible science, focusing on additional open source tools for researchers and data scientists, such as the R programming language for data science, as well as associated tools like RStudio and R Markdown. Additionally, users are introduced to Python and iPython notebooks, Google Colab, and are given hands-on tutorials on how to create a Binder environment, as well as various containers in Docker and Singularity.
This is a hands-on tutorial on PLINK, the open source whole genome association analysis toolset. The aims of this tutorial are to teach users how to perform basic quality control on genetic datasets, as well as to identify and understand GWAS summary statistics.