Last updated: 2020-08-21

Checks: 5 1

Knit directory: ChromatinSplicingQTLs/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20191126) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Tracking code development and connecting the code version to the results is critical for reproducibility. To start using Git, open the Terminal and type git init in your project directory.


This project is not being versioned with Git. To obtain the full reproducibility benefits of using workflowr, please see ?wflow_start.


Project description

A project investigating the correlation of genetic effects on chromatin, splicing, transcription, and complex phenotypes, using naturally occuring human genetic variation.

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
     speed           dist       
 Min.   : 4.0   Min.   :  2.00  
 1st Qu.:12.0   1st Qu.: 26.00  
 Median :15.0   Median : 36.00  
 Mean   :15.4   Mean   : 42.98  
 3rd Qu.:19.0   3rd Qu.: 56.00  
 Max.   :25.0   Max.   :120.00  

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Rmarkdowns as a workflowr

Make sure you installed the workflowr library for R. I have been organizing this project according to this cookiecutter project template https://github.com/bfairkun/cookiecutter-wflowR-smk. To summarize, this project structure follows the recommended project structure by the workflowr project (https://jdblischak.github.io/workflowr/articles/wflow-01-getting-started.html), where all of the files in code are part of a snakemake workflow. The snakemake workflow takes care of all of all the data proessing that requires lots of computational resources, and outputs some relevant files into the output directory if they are small enough to add to github. Large files that the snakemake creates are saved in code and not added to git repository as specified in code/.gitignore (since github places a limit on the size of files that can be upload), but theoretically, they can be easily recreated using the snakemake. From there, I use these R to further analyze and explore the results, in the form of Rmarkdown scripts for each conceptually distinct analysis I am interested in. Each Rmarkdown script should contain enough information for others to follow my thoughts as I analyze data. All of the datafiles required to run the Rmarkdown scripts should be in output, data, or code, and all the Rmarkdown scripts should reference these filepaths using relative filepaths for easier reproducibility. When I am satisfied with the Rmarkdwon analysis and I want to make my analyses public, I use the workflowr::wflow_build() command in the console. This will knit the Rmd file(s) in analysis into html files in docs, which github can host for display as a public website. All Rmd files that are to be included into this workflow must be saved in analysis. I have a naming convention for Rmd files:

  1. Rmd files that rely only on datafiles in data/ or output/ should be named as such: analysis/DataNotInCode_<date>_<short title>.Rmd.
  2. Rmd files that rely on datafiles that are in code/ should be named as such: analysis/DataInCode_<date>_<short title>.Rmd.

With this, Rmd files that can be easily ran by cloning the git repo to my local computer are separated from the Rmd files that require large data files which I keep on midway as part of the snakemake. Therefore, from my laptop, I can run build Rmd files locally with workflowr::wflow_build("analysis/DataNotInCode_*"). I can build the other Rmd files on midway. After Rmd files are built into docs/, I can add and commit them to the repository.

Here is an example Rmd.

Contributing.

Let’s use this branching workflow to collaborate with git. Others can clone my repo, start their own branch, make commits, and send pull requests to merge into master (the public site that github hosts will be on the master branch). Only I will have permissions to write to the master branch, so collaboraters must send pull requests. Alternatively, we can work with this forking workflow if you prefer, so that you have permissions to write to the master branch on your own repository.


sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] workflowr_1.6.2 Rcpp_1.0.3      rprojroot_1.3-2 digest_0.6.20  
 [5] later_0.8.0     R6_2.4.0        backports_1.1.4 git2r_0.26.1   
 [9] magrittr_1.5    evaluate_0.14   stringi_1.4.3   fs_1.3.1       
[13] promises_1.0.1  rmarkdown_1.13  tools_3.6.1     stringr_1.4.0  
[17] glue_1.3.1      httpuv_1.5.1    xfun_0.8        yaml_2.2.0     
[21] compiler_3.6.1  htmltools_0.3.6 knitr_1.23