Last updated: 2021-08-31

Checks: 7 0

Knit directory: proxyMR/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20210602) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version b849be4. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rproj.user/
    Ignored:    fig_test.RData
    Ignored:    figures_grid_plus_legend.RData
    Ignored:    household_MR_exhaustive_summary.rds
    Ignored:    proxyMR_comparison.RData
    Ignored:    proxyMR_figure_data.RData
    Ignored:    renv/library/
    Ignored:    renv/local/
    Ignored:    renv/staging/
    Ignored:    summary.RData
    Ignored:    traits_corr2_update.rds

Untracked files:
    Untracked:  R/proxyMR_figures.R
    Untracked:  R/proxyMR_hiercharchy.R

Unstaged changes:
    Modified:   R/functions.R
    Modified:   analysis/update_meeting_14_06_2021.Rmd
    Modified:   analysis/update_meeting_14_07_2021.Rmd
    Modified:   analysis/update_meeting_25_08_2021.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/setup.Rmd) and HTML (docs/setup.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd b849be4 jennysjaarda 2021-08-31 clean up some website docs
html b849be4 jennysjaarda 2021-08-31 clean up some website docs
html 0e10427 jennysjaarda 2021-07-22 Build site.
Rmd 750d442 jennysjaarda 2021-06-16 add website docs
html 750d442 jennysjaarda 2021-06-16 add website docs

Last updated: 2021-08-31

Code version: b849be41cbf2aae369d3cbf1d0d7faa4fdbd42d3

To reproduce the results from this project, please follow these instructions.

In general, targets was used to manage long-running code and workflowr was used to manage the website.

1 Initiate project on remote server.

All processing scripts were run from the root sgg directory. Project was initialized using workflowr rpackage, see here.

On personal computer:

project_name <- "proxyMR"
library("workflowr")

wflow_start(project_name) # creates directory called `project_name`

options("workflowr.view" = FALSE) # if using cluster
options("workflowr.view" = TRUE) # defaule, if using Rstudio on personal computer

wflow_build() # create directories
options(workflowr.sysgit = "")

wflow_use_github("jennysjaarda") # select option 2: manually create new repository

wflow_git_push()

You have now successfully created a GitHub repository for your project that is accessible on GitHub and the servers.

Next setup a local copy.

2 Create local copy on personal computer.

Within terminal of personal computer, clone the git repository.

cd ~/Dropbox/UNIL/projects/
git clone https://GitHub.com/jennysjaarda/proxyMR.git proxyMR

Open project in Rstudio (or preferred text editor) and modify the following files:

  • Because Terminal cannot generate a preview and workflowr doesn’t like the sysgit, to the .Rprofile file, add:
    • options(workflowr.sysgit = "")
    • options("workflowr.view" = FALSE)
  • To ensure git hub isn’t managing large files, modify the .gitignore file, by adding the following lines:
analysis/*
data/*
!analysis/*.Rmd
_targets
*.out
*.rds
*.RData
  • Save and push these changes to GitHub.
  • Pull to the server.

3 Create internal project folders.

Return to sgg server and run the following:

project_dir=/data/sgg2/jenny/projects/proxyMR
mkdir $project_dir/data/raw
mkdir $project_dir/data/processed
mkdir $project_dir/docs/assets
mkdir $project_dir/docs/generated_reports
mkdir $project_dir/analysis/data_setup
mkdir $project_dir/analysis/data_setup/IV_lists
mkdir $project_dir/analysis/data_setup/IV_info
mkdir $project_dir/analysis/data_setup/sex_heterogeneity
mkdir $project_dir/analysis/traitMR
mkdir $project_dir/analysis/traitMR/IVs
mkdir $project_dir/analysis/traitMR/IVs/Neale
mkdir $project_dir/analysis/traitMR/household_MR
mkdir $project_dir/analysis/traitMR/household_GWAS
mkdir $project_dir/analysis/traitMR/standard_GWAS
mkdir $project_dir/analysis/traitMR/standard_MR
mkdir $project_dir/analysis/traitMR/trait_info
mkdir $project_dir/analysis/traitMR/pheno_files
mkdir $project_dir/analysis/traitMR/pheno_files/phesant
mkdir $project_dir/analysis/traitMR/pheno_files/raw_change
mkdir $project_dir/output/figures
mkdir $project_dir/output/tables
mkdir $project_dir/output/figures/traitMR
mkdir $project_dir/output/tables/traitMR

This will create the following directory structure in proxyMR/:

proxyMR/
├── .gitignore
├── .gitattributes
├── .Rprofile
├── _workflowr.yml
├── analysis/
│   ├── data_setup/
│   │   ├── sex_heterogeneity/
│   │   ├── IV_info/
│   │   └── IV_lists/
│   ├── trait_MR/
│   │   ├── IVs/
│   │   │   └── Neale/
│   │   ├── household_MR/
│   │   ├── standard_MR/
│   │   ├── household_GWAS/
│   │   ├── standard_GWAS/
│   │   ├── pheno_files/
│   │   │   ├── phesant/
│   │   │   └── raw_change/
│   │   └── trait_info/
│   ├── about.Rmd
│   ├── index.Rmd
│   ├── license.Rmd
│   └── _site.yml
├── code/
│   ├── README.md
├── data/
│   ├── README.md
│   ├── raw/
│   └── processed/
├── docs/
│   ├── generated_reports/
│   └── assets/
├── proxyMR.Rproj
├── output/
│   ├── figures/
│   │   └── trait_MR/
│   ├── tables/
│   │   └── trait_MR/
│   └── README.md
└── README.md

Note that after renv and targets is initialized below a few other directories and files will also be created, namely:

proxyMR/
├── renv/
├── _targets/
├── renv.lock
└── .renvignore

4 Initialize a renv directory

Renv is a dependency management system for R. It is useful for making your project:

Isolated: Installing a new or updated package for one project won’t break your other projects, and vice versa. That’s because renv gives each project its own private package library.

Portable: Easily transport your projects from one computer to another, even across different platforms. renv makes it easy to install the packages your project depends on.

Reproducible: renv records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.

The renv package is a new effort to bring project-local R dependency management to your projects. The goal is for renv to be a robust, stable replacement for the Packrat package, with fewer surprises and better default behaviors.

Initialize a renv project by simply running:

renv::init()
tar_renv()

This creates a renv directory in your project folder.

Now, every time you launch R from this directory or run install.packages(), renv will automatically keep track of your packages and versions. You are no longer in an ordinary R project; you’re in a renv project. The main difference is that a renv project has its own private package library. Any packages you install from inside a renv project are only available to that project; and packages you install outside of the project are not available to the project.

Periodically call renv::snapshot() to save the state of the project library to the lockfile (called renv.lock), continue working on your project, installing and updating R packages as needed. Call renv::restore() to revert to the previous state as encoded in the lockfile if your attempts to update packages introduced some new problems.

5 Copy relevant files to project directory.

Only do this once. The Neale_SGG_directory.csv file gives a summary of all SGG UKBB phenotypes that we have in our database and Neale phenotypes that have been downloaded. This file changes overtime as new phenotypes are requested from the UKBB and/or new Neale files are downloaded (i.e. more rows are added to the file). Because the targets package tracks input files and their changes, the current state of this file was copied to the project directory and this file was then used as an input file into the pipeline. This file is merely a reference file and changes to it don’t correspond to changes in the underlying data used in the analyses.

Neale_summary_dir <- "/data/sgg2/jenny/data/Neale_UKBB_GWAS"
Neale_SGG_dir_org <- paste0(Neale_summary_dir,"/Neale_SGG_directory.csv") 
Neale_SGG_dir_cp <- paste0("data/Neale_SGG_directory_", format(Sys.Date(), "%d_%m_%Y"), ".csv")
file.copy(Neale_SGG_dir_org, Neale_SGG_dir_cp)

6 Run and summarize analyses.

6.1 Setup targets and execute plan.

Note that Part A and B are happening in parallel.

The targets package was used to manage teh pipeline and maintain a reproducible workflow. It integrates well with HPC clusters making it an ideal choice for this project.

library(targets)

For execution of targets plan: see the _targets.R file.

Configure a slurm template

options(clustermq.scheduler = "slurm", clustermq.template = "slurm_clustermq.tmpl")
drake_hpc_template_file("slurm_clustermq.tmpl")

# Write the file slurm_clustermq.tmpl. and edit manually

The file created using the clustermq template was edited manually to match slurm_clustermq.tmpl

cat(readLines('slurm_clustermq.tmpl'), sep = '\n')
#!/bin/sh
# From https://github.com/mschubert/clustermq/wiki/SLURM
#SBATCH --job-name={{ job_name }}                         # job name
#SBATCH --partition={{ partition }}                       # partition
#SBATCH --output={{ log_file | /dev/null }}               # you can add .%a for array index
#SBATCH --error={{ log_file | /dev/null }}                # log file
#SBATCH --account={{ account }}                           # account
#SBATCH --array=1-{{ n_jobs }}                            # job array
#SBATCH --ntasks={{ ntasks }}
# module load R                                           # Uncomment if R is an environment module.

CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'

6.2 Build and maintain website.

Follow the general workflow outlined by workflowr, with some minor revisions to accommodate workflow between personal computer and remote server:

  1. Open a new or existing R Markdown file in analysis/ (optionally using wflow_open()). (Usually created manually on personal computer and push to server to build later.) If creating manually, add the following to the top of the R Markdown file with an appropriate name for Title:
---
title: "Title"
site: workflowr::wflow_site
output:
  workflowr::wflow_html:
    toc: true
---
  1. Write documentation and perform analyses in the R Markdown file.

  2. Run commit and push to upload revised R Markdown file to GitHub repository.

  3. On server, pull changes using wflow_git_pull() (optionally using git pull from Terminal within cloned repository).

  4. Within R console, run wflow_build(). This will create html files with docs/ folder. These files cannot be viewed directly on server, but can be transferred and viewed via FileZilla or viewed directly by mounting the remote directory to your personal computer using SSHFS (recommended). Alternatively, Rmd files can be built within the targets pipeline using tar_render, see examples in the targets manual here. Perhaps the second option is preferred as tar_render tracks all the Finds all the tar_load() / tar_read()dependencies in the report and inserts them into the target’s command. This enforces the proper dependency relationships.

  5. Return to step 2 until satisfied with the result (optionally, edit Rmd file directly on server using vi if only small modifications are necessary).

  6. Run wflow_status() to track repository.

  7. Run wflow_publish() to commit the source files (R Markdown files or other files in code/, data/, and output/), build the HTML files, and commit the HTML files. If there are uncommited files in the directory that are not “.Rmd”, wflow_publish(all=T) does not work. Alternatively, run the following with an informative message:

repo_status <- wflow_status()
rmd_commit <- c(rownames(repo_status$status)[repo_status$status$modified],
         rownames(repo_status$status)[repo_status$status$unpublished],
         rownames(repo_status$status)[repo_status$status$scratch])

wflow_publish(rmd_commit,
              message="Updating website")
  1. Push the changes to GitHub with wflow_git_push() (or git push in the Terminal).

sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Big Sur 11.5.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7        whisker_0.4       knitr_1.33        magrittr_2.0.1   
 [5] workflowr_1.6.2   R6_2.5.0          rlang_0.4.11      fansi_0.5.0      
 [9] stringr_1.4.0     tools_4.1.0       xfun_0.23         utf8_1.2.1       
[13] git2r_0.28.0      htmltools_0.5.1.1 ellipsis_0.3.2    rprojroot_2.0.2  
[17] yaml_2.2.1        digest_0.6.27     tibble_3.1.2      lifecycle_1.0.0  
[21] crayon_1.4.1      later_1.2.0       vctrs_0.3.8       promises_1.2.0.1 
[25] fs_1.5.0          glue_1.4.2        evaluate_0.14     rmarkdown_2.10   
[29] stringi_1.6.2     compiler_4.1.0    pillar_1.6.1      httpuv_1.6.1     
[33] pkgconfig_2.0.3