class: top, right, inverse ## ACCE Research Data and Project Management *** .bottom[ # Introduction & Welcome #### 10-11 April 2019, University of Sheffield #### Dr Anna Krystalli @annakrystalli ] --- # π Hello ### me: Dr Anna Krystalli - **Research Software Engineer**, _University of Sheffield_ + twitter **@annakrystalli** + github **@annakrystalli** + email **a.krystalli[at]sheffield.ac.uk** - **Editor [rOpenSci](http://onboarding.ropensci.org/)** - **Co-organiser:** [Sheffield R Users group](https://www.meetup.com/SheffieldR-Sheffield-R-Users-Group/) --- # Welcome ## Course Outline <http://annakrystalli.me/rrresearch/> ### Today - Introduction - Basic Data Hygiene - Research Data Management - Metadata - Literate Programming with Rmarkdown - Version Control with Git --- # Welcome ## Course Outline <http://annakrystalli.me/rrresearch/> ### Tomorrow - Collaborating through GitHub - Managing code as a package - Bringing it all together: a Research compendium --- class: top, right, inverse # Why are we here? *** --- ### The paper is the advertisement > βan article about computational result is advertising, not scholarship. The actual scholarship is the **full software environment, code and data, that produced the result.**β *John Claerbout paraphrased in [Buckheit and Donoho (1995)](https://statweb.stanford.edu/~wavelab/Wavelab_850/wavelab.pdf)* -- ### [The Scientific Paper Is Obsolete](https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676/) Here's what's next *<small>APR 5, 2018, The Atlantic</small>* <img src="../assets/SciencePaperFlames-New.gif" height="100px" width="350px"> --- ### Lessons from the Reproducibility/Replicability crisis - Many issues statistical and a results of broken Academic incentive systems. - Much can be tackled by transparency and better computational literacy. <img src="assets/woes.png" width="450px"> --- ### [Reproducible Research in Computational Science](http://science.sciencemag.org/content/334/6060/1226) ROGER D. PENG, SCIENCE 02 DEC 2011 : 1226-1227 > Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible. <img src="../assets/repro-spectrum.jpg" width=550px> --- ## Reinventing discovery by open sourcing science _Nielsen, Michael. Reinventing Discovery: The New Era of Networked Science. Princeton University Press, 2012. JSTOR, www.jstor.org/stable/j.ctt7s4vx._ .pull-left[ - Sharing resources - Collective intelligence - Mass collaboration ] .pull-right[ <img src="../assets/reinventing-innovation.png" height="300px"> ] --- ## The internet was built for open science ### Key to next generation networked science .center[![](../assets/www.jpg)] --- ## **The grand vision** ### Hans Rosling on open data (and data science) back in 2006 .center[ <iframe width="470" height="250" src="https://goo.gl/ry6AiG" frameborder="0" allowfullscreen></iframe> ] > So how far have we come? --- ## gapminder.org: today #### liberating stories from data <iframe src="https://www.gapminder.org/tools/?embedded=true#$chart-type=bubbles" style="width: 100%; height: 450px; margin: 0 0 0 0; border: 1px solid grey;" allowfullscreen></iframe> --- ## gapmider at our fingertips ```r library(ggplot2) p <- ggplot(gapminder::gapminder, aes(gdpPercap, lifeExp, size = pop, color = continent, frame = year)) + geom_point() + scale_x_log10() + theme_bw() ``` ```r plotly::ggplotly(p) ```
--- class: top, right, inverse # How do we get there? *** --- ## **21st Century Research meta-responsibilities** We need better digital curation of the workhorses of modern science: **code** & **data** > **aim to create secure materials that are [FAIR](https://www.nature.com/articles/sdata201618)** > *findable, accessible, interoperable, reusable* --- ## **21st Century Research meta-responsibilities** *** .pull-left[ - Think about traceablility and provencance - Follow community conventions - Prepare it to share it ] .pull-right[ #### We all need to do our bit <img src="https://metrouk2.files.wordpress.com/2012/08/article-1344528089185-0d5e3c8900000578-276474_636x362.jpg" height=250px> ] --- ## **Drivers of better digital management** - Funders: value for money, impact, reputation - Publishers: many now require code and data. + Specialist journals have emerged for: + **software**: [Journal of Open Source Software](http://joss.theoj.org/), [MEE](https://besjournals.onlinelibrary.wiley.com/journal/2041210x) + **data**: [Scientific Data](https://www.nature.com/sdata/)) - PIs, Supervisors and immediate research group - Your wider scientific community - The public --- ## **Yourselves!** **Be your own best friend:** .center[![](https://media.giphy.com/media/9Q249Qsl5cfLi/giphy.gif)] --- ### **Ultimately it's about getting a handle on our research materials** > "Agree on a community convention...then follow it"" .centre[ ![](../assets/img/beer_messy_tidy.png) ] --- ## The concept of a Research Compendium >β ...We introduce the concept of a compendium as both a container for the different elements that make up the document and its computations (i.e. text, code, data, ...), and as a means for distributing, managing and updating the collection." [_Gentleman and Temple Lang, 2004_](https://biostats.bepress.com/bioconductor/paper2/) --- background-image: url(../assets/reproducible-data-analysis-02.png) background-size: contain class: bottom [_Kartik Ram: rstudio::conf 2019 talk_](https://github.com/karthik/rstudio2019) --- background-image: url(../assets/reproducible-data-analysis-04.png) background-size: contain class: bottom [_Kartik Ram: rstudio::conf 2019 talk_](https://github.com/karthik/rstudio2019) --- background-image: url(../assets/reproducible-data-analysis-06.png) background-size: contain class: bottom [_Kartik Ram: rstudio::conf 2019 talk_](https://github.com/karthik/rstudio2019) --- ## Back to "Why are we here?" - To help you make the most of the real workhorses of your work, **YOUR CODE & DATA**! -- - We'll do this by introducing you to **useful data and software tools and best practices**. -- - To help you be empowered by modern tools & technologies rather than be overwhelmed by them -- - To help you lead the culture change rather than be burdened by increased requirements -- - Ultimately, to **change how science works for better for everyone**! --- # Before we dive in - We'll exploring best practice in data and workflow management. I've tried to focus on concepts and tools that I wish I knew when I started -- - We'll explore individual tools and concepts and show how they work nicely together. -- - We'll also be working together through exercises. -- - We'll be using colour post it's to track progress at give time for folks to catch up at key stages using traffic lights thhroughout the materials π¦ -- - Feedback between sessions: After each session, let me know on your post-its: - π: somethind you liked - π΄: somethind that could be improved -- - Please feel free if I use jargon you don't understand or need some clarification. Questions are helpful for everyone! β¨