+ - 0:00:00
Notes for current slide
Notes for next slide

ACCE Research Data and Project Management


Introduction & Welcome

10-11 April 2019, University of Sheffield

Dr Anna Krystalli @annakrystalli

1 / 28

👋 Hello

me: Dr Anna Krystalli

  • Research Software Engineer, University of Sheffield

    • twitter @annakrystalli
    • github @annakrystalli
    • email a.krystalli[at]sheffield.ac.uk
  • Editor rOpenSci

  • Co-organiser: Sheffield R Users group

2 / 28

Welcome

Course Materials http://annakrystalli.me/rrresearch/

3 / 28

Why are we here?


4 / 28

The paper is the advertisement

“an article about computational result is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result.

John Claerbout paraphrased in Buckheit and Donoho (1995)

5 / 28

The paper is the advertisement

“an article about computational result is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result.

John Claerbout paraphrased in Buckheit and Donoho (1995)

The Scientific Paper Is Obsolete

Here's what's next

APR 5, 2018, The Atlantic

5 / 28

Lessons from the Reproducibility/Replicability crisis

  • Many issues statistical and a results of broken Academic incentive systems.

  • Much can be tackled by transparency and better computational literacy.

6 / 28

Reproducible Research in Computational Science

ROGER D. PENG, SCIENCE 02 DEC 2011 : 1226-1227

Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible.

7 / 28

Reinventing discovery by open sourcing science

Nielsen, Michael. Reinventing Discovery: The New Era of Networked Science. Princeton University Press, 2012. JSTOR, www.jstor.org/stable/j.ctt7s4vx.

  • Sharing resources
  • Collective intelligence
  • Mass collaboration

8 / 28

The internet was built for open science

Key to next generation networked science

9 / 28

The grand vision

Hans Rosling on open data (and data science) back in 2006

So how far have we come?

10 / 28

gapminder.org: today

liberating stories from data

11 / 28

gapmider at our fingertips

library(ggplot2)
p <- ggplot(gapminder::gapminder,
aes(gdpPercap, lifeExp, size = pop,
color = continent, frame = year)) +
geom_point() + scale_x_log10() + theme_bw()
plotly::ggplotly(p)
1e+031e+041e+05406080
AfricaAmericasAsiaEuropeOceania~year: 1952195219621972198219922002PlaygdpPercaplifeExpcontinentpop
12 / 28

How do we get there?


13 / 28

21st Century Research meta-responsibilities

We need better digital curation of the workhorses of modern science: code & data

aim to create secure materials that are FAIR findable, accessible, interoperable, reusable

14 / 28

21st Century Research meta-responsibilities


  • Think about traceablility and provencance
  • Follow community conventions
  • Prepare it to share it

We all need to do our bit

15 / 28

Drivers of better digital management

  • Funders: value for money, impact, reputation

  • Publishers: many now require code and data.

  • PIs, Supervisors and immediate research group

  • Your wider scientific community

  • The public

16 / 28

Yourselves!

Be your own best friend:

17 / 28

Ultimately it's about getting a handle on our research materials

"Agree on a community convention...then follow it""

18 / 28

The concept of a Research Compendium

“ ...We introduce the concept of a compendium as both a container for the different elements that make up the document and its computations (i.e. text, code, data, ...), and as a means for distributing, managing and updating the collection."

Gentleman and Temple Lang, 2004

19 / 28

R + Rstudio

Next generation data science powerhouse

23 / 28

R + Rstudio

Next generation data science powerhouse

Backed by a diverse and active community of learners, users and developers

23 / 28

Back to "Why are we here?"

  • To help you make the most of the real workhorses of your work, YOUR CODE & DATA!
24 / 28

Back to "Why are we here?"

  • To help you make the most of the real workhorses of your work, YOUR CODE & DATA!

  • We'll do this by introducing you to useful data and software tools and best practices.

24 / 28

Back to "Why are we here?"

  • To help you make the most of the real workhorses of your work, YOUR CODE & DATA!

  • We'll do this by introducing you to useful data and software tools and best practices.

  • To help you be empowered by modern tools & technologies rather than be overwhelmed by them

24 / 28

Back to "Why are we here?"

  • To help you make the most of the real workhorses of your work, YOUR CODE & DATA!

  • We'll do this by introducing you to useful data and software tools and best practices.

  • To help you be empowered by modern tools & technologies rather than be overwhelmed by them

  • To help you lead the culture change rather than be burdened by increased requirements

24 / 28

Back to "Why are we here?"

  • To help you make the most of the real workhorses of your work, YOUR CODE & DATA!

  • We'll do this by introducing you to useful data and software tools and best practices.

  • To help you be empowered by modern tools & technologies rather than be overwhelmed by them

  • To help you lead the culture change rather than be burdened by increased requirements

  • Ultimately, to change how science works for better for everyone!

24 / 28

Course Outline

Today

  • Introduction
  • Basic Data Hygiene
  • Research Data Management - Metadata
  • Literate Programming with Rmarkdown
  • Version Control with Git
25 / 28

Course Outline

Tomorrow

  • Collaborating through GitHub
  • Managing code as a package
  • Bringing it all together: a Research compendium
26 / 28

Before we dive in

  • We'll exploring best practice in data and workflow management. I've tried to focus on concepts and tools that I wish I knew when I started
27 / 28

Before we dive in

  • We'll exploring best practice in data and workflow management. I've tried to focus on concepts and tools that I wish I knew when I started
  • We'll explore individual tools and concepts and show how they work nicely together.
27 / 28

Before we dive in

  • We'll exploring best practice in data and workflow management. I've tried to focus on concepts and tools that I wish I knew when I started
  • We'll explore individual tools and concepts and show how they work nicely together.

  • We'll also be working together through exercises.

27 / 28

Before we dive in

  • We'll exploring best practice in data and workflow management. I've tried to focus on concepts and tools that I wish I knew when I started
  • We'll explore individual tools and concepts and show how they work nicely together.

  • We'll also be working together through exercises.

  • We'll be using colour post it's to track progress at give time for folks to catch up at key stages using traffic lights thhroughout the materials 🚦

27 / 28

Before we dive in

  • We'll exploring best practice in data and workflow management. I've tried to focus on concepts and tools that I wish I knew when I started
  • We'll explore individual tools and concepts and show how they work nicely together.

  • We'll also be working together through exercises.

  • We'll be using colour post it's to track progress at give time for folks to catch up at key stages using traffic lights thhroughout the materials 🚦

  • Feedback between sessions: After each session, let me know on your post-its:

    • 📗: somethind you liked
    • 🔴: somethind that could be improved
27 / 28

Before we dive in

  • We'll exploring best practice in data and workflow management. I've tried to focus on concepts and tools that I wish I knew when I started
  • We'll explore individual tools and concepts and show how they work nicely together.

  • We'll also be working together through exercises.

  • We'll be using colour post it's to track progress at give time for folks to catch up at key stages using traffic lights thhroughout the materials 🚦

  • Feedback between sessions: After each session, let me know on your post-its:

    • 📗: somethind you liked
    • 🔴: somethind that could be improved
  • Please feel to stop me if I use jargon you don't understand or need some clarification. Questions are helpful for everyone! ✨

27 / 28

Get Ready

Open Rstudio

Install packages

Run this code in the console

install.packages(c("devtools", "tinytex", "rmarkdown", "usethis",
"here", "tidyverse"))

Once tinytex is installed, used it to install a minimal installation of LaTeX:

tinytex::install_tinytex()

Some of these are large packages so it's best to get their installation going. If you have any installation problems, come see me at the next coffee break

Get back home

28 / 28

👋 Hello

me: Dr Anna Krystalli

  • Research Software Engineer, University of Sheffield

    • twitter @annakrystalli
    • github @annakrystalli
    • email a.krystalli[at]sheffield.ac.uk
  • Editor rOpenSci

  • Co-organiser: Sheffield R Users group

2 / 28
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow