Last updated: 2019-12-06
Checks: 7 0
Knit directory: PSYMETAB/
This reproducible R Markdown analysis was created with workflowr (version 1.5.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20191126)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .drake/
Ignored: analysis/QC/
Ignored: data/processed/
Ignored: data/raw/
Untracked files:
Untracked: post_impute_qc.out
Unstaged changes:
Deleted: post_imputation_qc.log
Deleted: pre_imputation_qc.log
Modified: pre_impute_qc.out
Deleted: qc_part2.out
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view them.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | bee9ea8 | Sjaarda Jennifer Lynn | 2019-12-06 | add step for using wflow_status() |
Rmd | e430d04 | Sjaarda Jennifer Lynn | 2019-12-06 | modify commiting instructions |
html | d1e539c | Jenny Sjaarda | 2019-12-06 | Build site. |
Rmd | 487b5f5 | Sjaarda Jennifer Lynn | 2019-12-06 | update website, add qc description |
html | 9f1ba5e | Jenny Sjaarda | 2019-12-06 | Build site. |
Rmd | 5e454c3 | Sjaarda Jennifer Lynn | 2019-12-06 | add more details to website |
Rmd | d480e35 | Jenny | 2019-12-04 | misc annotations |
html | 125be8c | Jenny Sjaarda | 2019-12-02 | build website |
Rmd | 179fb3b | Jenny | 2019-12-02 | eval false to drake launch |
Rmd | 0dd02a7 | Jenny | 2019-12-02 | modify website |
html | 2849dcb | Jenny Sjaarda | 2019-12-02 | wflow_git_commit(all = T) |
Rmd | 49a7ba9 | Sjaarda Jennifer Lynn | 2019-12-02 | modify git ignore |
Last updated: 2019-12-06
Code version: 40814e5ee671bf607fe76820633908ec63132ddd
To reproduce the results from this project, please follow these instructions.
In general, drake
was used to manage long-running code and workflowr
was used to manage the website.
All processing scripts were run from the root sgg directory. Project was initialized using workflowr
rpackage, see here.
On sgg server:
project_name <- "PSYMETAB"
library("workflowr")
wflow_start(project_name) # creates directory called project_name
options("workflowr.view" = FALSE) # if using cluster
wflow_build() # create directories
options(workflowr.sysgit = "")
wflow_publish(c("analysis/index.Rmd", "analysis/about.Rmd", "analysis/license.Rmd"),
"Publish the initial files for myproject")
wflow_use_GitHub("jennysjaarda") # select option 2: manually create new repository
wflow_git_push()
You have now successfully created a GitHub repository for your project that is accessible on GitHub and the servers.
Next setup a local copy.
Within terminal of personal computer, clone the git repository.
cd ~/Dropbox/UNIL/projects/
git clone https://GitHub.com/jennysjaarda/PSYMETAB.git PSYMETAB
Open project in atom (or preferred text editor) and modify the following files:
workflowr
doesn’t like the sysgit, to the .Rprofile
file, add:
options(workflowr.sysgit = "")
options("workflowr.view" = FALSE)
.gitignore
file, by adding the following lines:
analysis/*
data/*
!analysis/*.Rmd
!data/*.md
.git/
Return to sgg server and run the following:
project_dir=/data/sgg2/jenny/projects/PSYMETAB
mkdir $project_dir/data/raw
mkdir $project_dir/data/processed
mkdir $project_dir/data/raw/reference_files
mkdir $project_dir/data/raw/phenotype_data
mkdir $project_dir/data/raw/extraction
mkdir $project_dir/data/processed/phenotype_data
mkdir $project_dir/data/processed/extraction
mkdir $project_dir/docs/assets
This will create the following directory structure in PSYMETAB/
:
PSYMETAB/
├── .gitignore
├── .Rprofile
├── _workflowr.yml
├── analysis/
│ ├── about.Rmd
│ ├── index.Rmd
│ ├── license.Rmd
│ └── _site.yml
├── code/
│ ├── README.md
├── data/
│ ├── README.md
│ ├── raw/
| ├── phenotype_data/
| ├── reference_files/
| └── extraction/
│ └── processed/
| ├── phenotype_data/
| ├── reference_files/
| └── extraction/
├── docs/
| └── assets/
├── myproject.Rproj
├── output/
│ └── README.md
└── README.md
Raw PLINK (ped
and map
files) data were copied from the CHUV folder (L:\PCN\UBPC\ANALYSES_RECHERCHE\Jenny\PSYMETAB_GWAS
) after being built in genomestudio to the data/
drive.
Note that Part A and B are happening in parallel.
For execution of drake plan: see make.R
For drake plan see: code/plan.R
Configure a slurm template
options(clustermq.scheduler = "slurm", clustermq.template = "slurm_clustermq.tmpl")
drake_hpc_template_file("slurm_clustermq.tmpl")
# Write the file slurm_clustermq.tmpl. and edit manually
The file created using the clustermq
template was edited manually to match slurm_clustermq.tmpl
cat(readLines('slurm_clustermq.tmpl'), sep = '\n')
#!/bin/sh
# From https://github.com/mschubert/clustermq/wiki/SLURM
#SBATCH --job-name={{ job_name }} # job name
#SBATCH --partition={{ partition }} # partition
#SBATCH --output={{ log_file | /dev/null }} # you can add .%a for array index
#SBATCH --error={{ log_file | /dev/null }} # log file
####SBATCH --mem-per-cpu={{ memory | 4096 }} # memory
#SBATCH --array=1-{{ n_jobs }} # job array
#SBATCH --cpus-per-task={{ cpus }}
# module load R # Uncomment if R is an environment module.
####ulimit -v $(( 1024 * {{ memory | 4096 }} ))
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'
Follow the general workflow outlined by workflowr, with some minor revisions to accomodate workflow between personal computer and remote server:
analysis/
(optionally using wflow_open()
). (Usually created manually on personal computer and push to server to build later.) If creating manually, add the following to the top of the R Markdown file with an appropriate name for Title
:---
title: "Title"
site: workflowr::wflow_site
output:
workflowr::wflow_html:
toc: true
---
Write documentation and perform analyses in the R Markdown file.
Run commit
and push
to upload revised R Markdown file to GitHub repository.
On server, pull changes using wflow_git_pull()
(optionally using git pull
from Terminal within cloned repository).
Within R console, run wflow_build()
. This will create html files with docs/
folder. These files cannot be viewed directly on server, but can be transfered and viewed via FileZilla or viewed directly by mounting the remote directory to your personal computer using SSHFS (recommended).
Return to step 2 until satisfied with the result (optionally, edit Rmd file directly on server using vi
if only small modifications are necessary).
Run wflow_status()
to track repository.
Run wflow_publish()
to commit the source files (R Markdown files or other files in code/
, data/
, and output/
), build the HTML files, and commit the HTML files. If there are uncommited files in the directory that are not “.Rmd”, wflow_publish(all=T)
does not work. Alternatively, run the following with an informative message
:
repo_status <- wflow_status()
rmd_commit <- c(rownames(repo_status$status)[repo_status$status$modified],
rownames(repo_status$status)[repo_status$status$unpublished],
rownames(repo_status$status)[repo_status$status$scratch])
wflow_publish(rmd_commit,
message="Updating webiste")
wflow_git_push()
(or git push
in the Terminal).
sessionInfo()
# R version 3.5.3 (2019-03-11)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: CentOS Linux 7 (Core)
#
# Matrix products: default
# BLAS: /data/sgg2/jenny/bin/R-3.5.3/lib64/R/lib/libRblas.so
# LAPACK: /data/sgg2/jenny/bin/R-3.5.3/lib64/R/lib/libRlapack.so
#
# locale:
# [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
# [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
# [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
# [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
# [9] LC_ADDRESS=C LC_TELEPHONE=C
# [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# loaded via a namespace (and not attached):
# [1] workflowr_1.5.0 Rcpp_1.0.3 rprojroot_1.3-2 digest_0.6.23
# [5] later_1.0.0 R6_2.4.1 backports_1.1.5 git2r_0.26.1
# [9] magrittr_1.5 evaluate_0.14 stringi_1.4.3 rlang_0.4.1
# [13] fs_1.3.1 promises_1.1.0 whisker_0.4 rmarkdown_1.18
# [17] tools_3.5.3 stringr_1.4.0 glue_1.3.1 httpuv_1.5.2
# [21] xfun_0.11 yaml_2.2.0 compiler_3.5.3 htmltools_0.4.0
# [25] knitr_1.26