+ - 0:00:00
Notes for current slide
Notes for next slide

Reproducible machine learning workflows

Patrick Schratz

September 27th, 2019

1

usethis::use_course("mlr-org/mlr3-learndrake")

Find someone to sit next to and share laptops



(Slide style and content borrowed from Garrick Aden-Buie )

2

What is drake?

3

What is drake?

4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
  • Will this work when I come back to it later?

  • What happens if I re-run everything?

  • Am I certain that the results are still valid?

22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

(borrowed from ropensci/drake)

44

(borrowed from ropensci/drake)

45

(borrowed from ropensci/drake)

46

drake Essentials

47

_drake.R

The main settings file

  • sources all R functions

  • sources all "drake-plans" and binds them together

  • sets the execution parameters in drake_config()

  • called by r_make() at the start of every run

48

drake "targets"

  • targets are simple R objects which know their dependencies

  • targets represent all intermediate or final results of your project

  • targets live in the drake-cache (.drake directory) and can from there be loaded into the global environment using loadd()

Advantages

  • No need to use saveRDS()/save() anymore to store intermediate outputs

  • Quick-load targets under cursor in RStudio via addin

  • visualize dependencies via r_vis_drake_graph()

49

r_vis_drake_graph()

Shows how all targets are connected

50

How does drake help you in your daily work?

  • Knows the execution order of your complete analysis

    r_vis_drake_graph()
  • Enables running your complete project in one call

    r_make()
  • Tracks metadata of each target (runtime, etc.)

  • If you make a substantial change downstream, all subsequent targets will be updated (plots, tables, models, etc.)

51

But wait, there's more!

There is a lot more that drake can do, including:

r_predict_runtime()
  • Support for large plans by using wildcards and functions map(), cross() and combine()

  • Parallel computation of targets via packages

  • future

  • clustermq (HPC)

  • Support for many HPC schedulers (SLURM, SGE, PBS, LSF) via package clustermq

52

Personal FAQ (1/2)

  • What is the difference between drake::make() and drake::r_make()?

r_make() runs in a fresh R session via package callr and sources the config before every run. No need to source the config yourself and then run make(config). See also drake-manual/safer-interactivity.

  • How do I parallelize code within targets?

By passing prework = quote(future::plan(future.callr::callr, workers = <n>)) or a similar parallel plan to drake_config() in _drake.R. Do not use future::plan() within scripts/functions.

  • I have no idea what the status of my running computation is. Is there a way to see it?

Yes, either use drake::progress(), drake::finished() or drake::failed(). Optionally make the output more verbose using drake::progress() %>% print(n = 200).

54

Personal FAQ (2/2)

  • Everything in the drake manual is forcing me to use functions but I am used to scripts mainly. What is better with functions?

Project structure and organization is eminently important. A function-based workflow enhances reproducibility and leads to a more clean project structure. See this discussion for more information.

  • Is there a convenient way to load targets quickly?

Yes! With the installation of drake there is an addin called "load targets under cursor" that lets you load the R object under your cursor when its bound to a keyboard shortcut. Mine is bound to "CTRL + L".

  • What about other addins and keyboard shortcuts?

Take a look into the "Addins" menu in RStudio for a list of avaiable addins. I have the followings addin/shorcut bindings: r_vis_drake_graph() -> "CTRL + Shift + V"; r_outdated() -> "CTRL + Shift + O"; "load target under cursor" -> "CTRL +L"

55

usethis::use_course("mlr-org/mlr3-learndrake")

Find someone to sit next to and share laptops



(Slide style and content borrowed from Garrick Aden-Buie )

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow