+ - 0:00:00
Notes for current slide
Notes for next slide

Research pipelines

George G Vega Yon

USC IMAGE project

2019/10/24

1 / 15

Today's talk

Reproducible Research

Tools to eaze your work

We will spend most time looking at other people's work

2 / 15
3 / 15

Reproducible research (RR) (or how to avoid "it works on my computer!")

4 / 15

Reproducible research (RR) (or how to avoid "it works on my computer!")

A major new issue in sciences (overall)

4 / 15

Reproducible research (RR) (or how to avoid "it works on my computer!")

A major new issue in sciences (overall)

A Major productivity problem for researchers

  • It is not only a good idea for science, but also for saving you time!
4 / 15

Reproducible research

5 / 15

Reproducible research

Polished paper

  • Ready to be submitted

  • Written using your fav doc editor

5 / 15

Reproducible research

Polished paper

  • Ready to be submitted

  • Written using your fav doc editor

Intermediate report

  • Not ready to be published

  • Not necesarily using your fav doc editor

What do these two have in common?

5 / 15

... both should have pretty figures AND be reproducible

6 / 15

The minimum

Data people must have a way to get your data

  • Include it with the paper.

  • Put it on a repo online like zenodo (see for example the Gene Ontology's profile) .

  • Include instructions about how to get the data (e.g. in an experimental setting, how did you get the samples).

7 / 15

The minimum

Data people must have a way to get your data

  • Include it with the paper.

  • Put it on a repo online like zenodo (see for example the Gene Ontology's profile) .

  • Include instructions about how to get the data (e.g. in an experimental setting, how did you get the samples).

Analysis source code/steps of your analysis

  • Include it with the paper

  • Put it on a repo online like Github or GitLab

7 / 15

The minimum

Data people must have a way to get your data

  • Include it with the paper.

  • Put it on a repo online like zenodo (see for example the Gene Ontology's profile) .

  • Include instructions about how to get the data (e.g. in an experimental setting, how did you get the samples).

Analysis source code/steps of your analysis

  • Include it with the paper

  • Put it on a repo online like Github or GitLab

Pro tip: Avoid the "contact the corresponding author for..." lines.

7 / 15

Tiers of reproducibility

minimum Basic

  • Data Use public dataset (or at least shareable), or publish your dataset.

  • Analysis source code/steps of your analysis.

Plus

  • Tools Use open source software (like R, python, etc.).

  • Decency Write your code neatly (like Emil does)

  • Tidy Organize your work in a structured way (folders+readme files, here is one example)

Premium

8 / 15

Research Pipeline

Source: Diagram by 문건웅 showed here

9 / 15

Now, a lot of it has to do with automatization...

10 / 15

Automatizing your research: automatic reports

11 / 15

Automatizing your research: automatic reports (example)

data(anorexia, package = "MASS")
anorex.1 <- glm(
Postwt ~ Treat,
data = anorexia
)
anorex.2 <- glm(
Postwt ~ Prewt + Treat,
data = anorexia
)
library(texreg)
htmlreg(
list(anorex.1, anorex.2),
doctype = FALSE
)
Statistical models
Model 1 Model 2
(Intercept) 85.70*** 49.77***
(1.35) (13.39)
TreatCont -4.59* -4.10*
(1.97) (1.89)
TreatFT 4.80* 4.56*
(2.23) (2.13)
Prewt 0.43**
(0.16)
AIC 495.28 489.97
BIC 504.39 501.36
Log Likelihood -243.64 -239.99
Deviance 3665.06 3311.26
Num. obs. 72 72
***p < 0.001; **p < 0.01; *p < 0.05
12 / 15

Automatizing your research: automatic reports (example)

data(anorexia, package = "MASS")
anorex.1 <- glm(
Postwt ~ Treat,
data = anorexia
)
anorex.2 <- glm(
Postwt ~ Prewt + Treat,
data = anorexia
)
library(texreg)
htmlreg(
list(anorex.1, anorex.2),
doctype = FALSE
)
Statistical models
Model 1 Model 2
(Intercept) 85.70*** 49.77***
(1.35) (13.39)
TreatCont -4.59* -4.10*
(1.97) (1.89)
TreatFT 4.80* 4.56*
(2.23) (2.13)
Prewt 0.43**
(0.16)
AIC 495.28 489.97
BIC 504.39 501.36
Log Likelihood -243.64 -239.99
Deviance 3665.06 3311.26
Num. obs. 72 72
***p < 0.001; **p < 0.01; *p < 0.05

Checkout the R CRAN Task View for Reproducible Research

12 / 15

Version control to the rescue...

13 / 15

Git: The stupid version control system

(Live demo now)

14 / 15

Thank you!

Some other resources

rOpenSci's "Reproduciblity in Science"

Karl Broman's "Tools for Reproducible Research Spring, 2016"

The workflowr package

Colin Fay's "An introduction to Docker for R Users"

R packages that work with Docker/Singularity babelwhale and dockerfiler

15 / 15

Today's talk

Reproducible Research

Tools to eaze your work

We will spend most time looking at other people's work

2 / 15
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow