Last updated: 2020-03-06

Checks: 7 0

Knit directory: rss-net/

This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20190823) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    examples/ibd2015_nkcell/ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_out_66.mat
    Ignored:    examples/ibd2015_nkcell/randinit_results/
    Ignored:    examples/ibd2015_nkcell/results/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd 4de49b7 xiangzhu 2020-03-06 wflow_publish(“rmd/wtccc_bcell.Rmd”)
html 5467783 xiangzhu 2020-03-06 Build site.
Rmd 9729b44 xiangzhu 2020-03-06 wflow_publish(“rmd/wtccc_bcell.Rmd”)
html c2ee73a xiangzhu 2020-03-06 Build site.
Rmd b109ab5 xiangzhu 2020-03-06 wflow_publish(“rmd/wtccc_bcell.Rmd”)

Overview

Here we describe an end-to-end RSS-NET analysis of two synthetic datasets that are simulated from real genotypes of 348965 genome-wide SNPs from 1458 individuals in the UK Blood Service Control Group (Wellcome Trust Case Control Consortium, 2007) and a gene regulatory network inferred for B cells (ENCODE Project Consortium, 2012). This example gives a quick view of how RSS-NET works.

To reproduce results of this example, please use scripts in the directory wtccc_bcell/, and follow the step-by-step guide below. Before running any script in wtccc_bcell/, please install RSS-NET.

With the software installed and the input data downloaded, one should be able to run this example by simply typing the following line in shell:

If a different directory is used to store the software, input data and/or output data, please modify file paths in the given scripts accordingly.

Step-by-step illustration

Download data files

All data files required to run this example are freely available at Zenodo DOI. Please contact me (xiangzhu[at]stanford.edu) if you have any trouble accessing these files. After a complete download, you should see the following files.

To help readers confirm if they use the same files as we do, we report 128-bit MD5 hashes of all files in wtccc_bcell_data.md5.

To help readers confirm if they can reproduce results of this example, we also provide the results (wtccc_bcell_results.zip) in the same Zenodo deposit.

For simplicity and generality, we introduce the following short-hand notations.

1. m*_data.mat: simulated GWAS summary statistics and reference LD estimates

Here we simulate two synthetic datasets from WTCCC genotypes and B cell network: one (m0_data.mat) based on RSS-NET baseline model \(M_0\), and the other (m1_data.mat) based on RSS-NET enrichment model \(M_1\). We match m*_data.mat by the number of trait-associated SNPs and the proportion of phenotypic variation explained by all SNPs. By this desgin, m*_data.mat have the same amount of genetic signals, and they only differ in how genetic signals are distributed: for the enrichment dataset m1_data.mat genetic associations are enriched in the B cell network; for the baseline dataset m0_data.mat genetic associations are randomly distributed across the genome.

Let’s look at the contents of m*_data.mat.

GWAS summary statistics and LD estimates are stored as cell arrays. RSS-NET only uses the following variables:

  • betahat{j,1}, single-SNP effect size estimates of all SNPs on chromosome j;
  • se{j,1}, standard errors of betahat{j, 1};
  • chr{j,1} and pos{j, 1}, physical positions of these SNPs (GRCh37 build);
  • SiRiS{j,1}, a sparse matrix, defined as repmat((1./se),1,p) .* R .* repmat((1./se)',p,1), where R is the estimated LD matrix of these p SNPs.

2. ${gwas}_snp2gene.mat: physical distance between SNPs and genes

This file contains the physical distance between each GWAS SNP and each protein-coding gene, within 1 Mb. This file corresponds to \({\bf G}_j\) in the RSS-NET model.

In this example, there are 18334 genes and 348965 SNPs.

The SNP-to-gene distance information is captured by a three-column matrix [colid rowid val]. For example, the distance between gene 7 and SNP 161 is 2855196 bps.

3. ${net}_gene2gene.mat: gene regulatory network

This file contains information of gene-to-gene connections in a given regulatory network.

For implementation convenience, this file contains the trivial case where each gene is mapped to itself with val=1.

For a given network, transcription factors (TFs) are stored in rowid and target genes (TGs) are stored in colid. In this example there are 3018 TGs and 436 TFs. Among these TFs and TGs, there are 91728 edges. The edge weights range from 0.61 to 1. These TF-to-TG connections and edge weights correspond to \(\{{\bf T}_g,v_{gt}\}\) in the RSS-NET model.

5. ${gwas}_${net}_snp2gene_cis.mat: SNP-to-gene cis regulation

This file contains the SNP-to-gene cis regulation scores. (For simplicity we do not use context-matching cis eQTL studies in our simulations.) This file corresponds to \((c_{jg}-1)\) in the RSS-NET model.

In this example, there are 291181 SNP-gene pairs with cis regulation scores available, consiting of 111097 SNPs and 5345 genes. The cis regulation scores (val) range from 0.02 to 0.90.


─ Session info ───────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.3 (2020-02-29)
 os       Ubuntu 18.04.4 LTS          
 system   x86_64, linux-gnu           
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/Los_Angeles         
 date     2020-03-06                  

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date       lib source        
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.0)
 backports     1.1.5   2019-10-02 [1] CRAN (R 3.6.1)
 callr         3.4.1   2020-01-24 [1] CRAN (R 3.6.2)
 cli           2.0.1   2020-01-08 [1] CRAN (R 3.6.2)
 crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.0)
 desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.0)
 devtools      2.2.1   2019-09-24 [1] CRAN (R 3.6.1)
 digest        0.6.23  2019-11-23 [1] CRAN (R 3.6.1)
 ellipsis      0.3.0   2019-09-20 [1] CRAN (R 3.6.1)
 evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.0)
 fansi         0.4.1   2020-01-08 [1] CRAN (R 3.6.2)
 fs            1.3.1   2019-05-06 [1] CRAN (R 3.6.0)
 git2r         0.26.1  2019-06-29 [1] CRAN (R 3.6.0)
 glue          1.3.1   2019-03-12 [1] CRAN (R 3.6.0)
 htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.1)
 httpuv        1.5.2   2019-09-11 [1] CRAN (R 3.6.1)
 knitr         1.28    2020-02-06 [1] CRAN (R 3.6.2)
 later         1.0.0   2019-10-04 [1] CRAN (R 3.6.1)
 magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.0)
 memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.0)
 pkgbuild      1.0.6   2019-10-09 [1] CRAN (R 3.6.1)
 pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.6.0)
 prettyunits   1.1.1   2020-01-24 [1] CRAN (R 3.6.2)
 processx      3.4.1   2019-07-18 [1] CRAN (R 3.6.1)
 promises      1.1.0   2019-10-04 [1] CRAN (R 3.6.1)
 ps            1.3.0   2018-12-21 [1] CRAN (R 3.6.0)
 R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.1)
 Rcpp          1.0.3   2019-11-08 [1] CRAN (R 3.6.1)
 remotes       2.1.0   2019-06-24 [1] CRAN (R 3.6.0)
 rlang         0.4.4   2020-01-28 [1] CRAN (R 3.6.2)
 rmarkdown     2.1     2020-01-20 [1] CRAN (R 3.6.2)
 rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.0)
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.0)
 stringi       1.4.5   2020-01-11 [1] CRAN (R 3.6.2)
 stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.0)
 testthat      2.3.1   2019-12-01 [1] CRAN (R 3.6.2)
 usethis       1.5.1   2019-07-04 [1] CRAN (R 3.6.1)
 whisker       0.4     2019-08-28 [1] CRAN (R 3.6.1)
 withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.0)
 workflowr   * 1.6.0   2019-12-19 [1] CRAN (R 3.6.2)
 xfun          0.12    2020-01-13 [1] CRAN (R 3.6.2)
 yaml          2.2.1   2020-02-01 [1] CRAN (R 3.6.2)

[1] /home/maimaizhu/R/x86_64-pc-linux-gnu-library/3.6
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library