Last updated: 2023-01-12
Checks: 7 0
Knit directory: bioinformatics_tips/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200503)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 82a105c. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/scripting.Rmd
) and HTML
(docs/scripting.html
) files. If you’ve configured a remote
Git repository (see ?wflow_git_remote
), click on the
hyperlinks in the table below to view the files as they were in that
past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 82a105c | Dave Tang | 2023-01-12 | Scripting |
html | 78910db | Dave Tang | 2022-04-13 | Build site. |
Rmd | f80e28b | Dave Tang | 2022-04-13 | Script as much as possible |
It may already be obvious that writing a script, i.e. a file that runs a series of commands, is much more preferable than having to type those commands manually. For example, instead of manually typing the commands of various tools that process your data, you saved all those commands into a single file and can simply execute that file, which is called a script, to run all the steps. Now you can easily re-run your analysis.
Having all your commands saved into a script makes it easier to
re-run your analysis with new data or new settings too. You could edit
your script manually to specify the location of the new data or what you
should do is write a script that accepts command line arguments. What this means is
that instead of hardcoding some value in your script like
/data/gene_exp.csv
, you assign it as arguments/parameters
to your script. If your script is called summarise.sh
, you
could write your script so that data is passed via the command line.
./summarise.sh /data/gene_exp.csv
You could also include settings/parameters that can be passed to your script, so you can easily re-run your analysis with different settings.
./summarise.sh /data/gene_exp.csv --alpha 0.5 --beta 3
After you have nicely scripted up this analysis, you start working on scripting up another analysis. However, you realise that some steps in your previous analysis are also needed in this analysis. You could use your previous script as a template and modify it for this analysis. But what you should do is include each step in its own separate command line argument accepting script. This way you don’t have to modify two analysis scripts when you need to make changes to an individual step. It may seem annoying to have to write ten separate scripts for an analysis pipeline that has ten steps. But this makes it much easier to maintain in the future, especially when you start building more and more analysis pipelines.
If you have gone this far to set up your work, you can go a bit further to tie everything together using a workflow management system. The benefits of such systems is that it makes it easier to manage your workflows. For example, you could execute your workflow via a queuing system or Google Cloud. You could set up limits for computational resources, restart jobs, cache results, and more.
Now that everything is nicely automated, it is time to include tests, which makes sure your analysis pipeline generates expected results. This should also be automated by using CI/CD, which means that each time you make a change to your pipeline, another pipeline is automatically run to see if your pipeline is running as expected.
Since everything is so nicely set up, you have more time to do what needs to be done! And it wouldn’t have been possible if you didn’t script everything up.
sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] workflowr_1.7.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.3 bslib_0.4.1 compiler_4.2.0 pillar_1.7.0
[5] later_1.3.0 git2r_0.30.1 jquerylib_0.1.4 tools_4.2.0
[9] getPass_0.2-2 digest_0.6.30 jsonlite_1.8.3 evaluate_0.17
[13] tibble_3.1.7 lifecycle_1.0.3 pkgconfig_2.0.3 rlang_1.0.6
[17] cli_3.4.1 rstudioapi_0.13 yaml_2.3.6 xfun_0.34
[21] fastmap_1.1.0 httr_1.4.3 stringr_1.4.0 knitr_1.39
[25] sass_0.4.1 fs_1.5.2 vctrs_0.5.0 rprojroot_2.0.3
[29] glue_1.6.2 R6_2.5.1 processx_3.8.0 fansi_1.0.3
[33] rmarkdown_2.17 callr_3.7.3 magrittr_2.0.3 whisker_0.4
[37] ps_1.7.2 promises_1.2.0.1 htmltools_0.5.2 ellipsis_0.3.2
[41] httpuv_1.6.6 utf8_1.2.2 stringi_1.7.6 cachem_1.0.6
[45] crayon_1.5.2