Last updated: 2020-03-04

Checks: 7 0

Knit directory: rss-net/

This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20190823) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    examples/ibd2015_nkcell/ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_out_66.mat
    Ignored:    examples/ibd2015_nkcell/randinit_results/
    Ignored:    examples/ibd2015_nkcell/results/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd 08f4c53 xiangzhu 2020-03-04 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 72d8e78 xiangzhu 2020-03-04 Build site.
Rmd 6b97ae5 xiangzhu 2020-03-04 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 0e9e3a2 xiangzhu 2020-03-03 Build site.
Rmd eb9c073 xiangzhu 2020-03-03 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
Rmd 6002e97 Xiang Zhu 2020-03-03 update ibd2015 nkcell example
html 6002e97 Xiang Zhu 2020-03-03 update ibd2015 nkcell example
html 5974d28 xiangzhu 2020-03-03 Build site.
Rmd 4a13f45 xiangzhu 2020-03-03 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html bf742e0 xiangzhu 2020-03-02 Build site.
Rmd a16d453 xiangzhu 2020-03-02 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 785a423 xiangzhu 2020-03-01 Build site.
Rmd bda7382 xiangzhu 2020-03-01 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 830b581 xiangzhu 2020-03-01 Build site.
Rmd a5b592e xiangzhu 2020-03-01 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html c3ca100 xiangzhu 2020-02-29 Build site.
Rmd 2e5247b xiangzhu 2020-02-29 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)

Overview

Here we describe an end-to-end RSS-NET analysis of inflammatory bowel disease (IBD) GWAS summary statistics (Liu et al, 2015) and a gene regulatory network inferred for natural killer (NK) cells. This example illustrates the actual data analyses performed in Zhu et al (2020).

To reproduce results of this example, please use scripts in the directory script_dir, and follow the step-by-step guide below. Before running any script in script_dir, please install RSS-NET.

Since a real genome-wide analysis is conducted here, this example is more complicated than the previous simulation example. It is advisable to go through the previous simulation example before diving into this real data example.

Note that the working directory here is assumed to be wdtba. Please modify scripts accordingly if a different directory is used.

Step-by-step illustration

Download input data files

1. ${gwas}_sumstat.mat: processed GWAS summary statistics and LD matrix estimates

This file is large (43G) because it has a LD matrix of 1.1 million common SNPs. Please contact me (xiangzhu[at]stanford.edu) if you have trouble accessing this file.

Let’s look at the contents of ibd2015_sumstat.mat.

GWAS summary statistics and LD estimates are stored as cell arrays. RSS-NET only uses the following variables:

  • betahat{j,1}, single-SNP effect size estimates of all SNPs on chromosome j;
  • se{j,1}, standard errors of betahat{j, 1};
  • chr{j,1} and pos{j, 1}, physical positions of these SNPs (GRCh37 build);
  • SiRiS{j,1}, a sparse matrix, defined as repmat((1./se),1,p) .* R .* repmat((1./se)',p,1), where R is the estimated LD matrix of these p SNPs.

2. ${gwas}_snp2gene.mat: physical distance between SNPs and genes

This file contains the physical distance between each GWAS SNP and each protein-coding gene, within 1 Mb. This file corresponds to \({\bf G}_j\) in the RSS-NET model.

In this example, there are 18334 SNPs and 1081481 genes.

The SNP-to-gene distance information is captured by a three-column matrix [colid rowid val]. For example, the distance between gene 1 and SNP 6 is 978947 bps.

3. ${net}_gene2gene.mat: gene regulatory network

This file contains information of gene-to-gene connections in a given regulatory network.

For implementation convenience, this file contains the trivial case where each gene is mapped to itself with val=1.

For a given network, transcription factors (TFs) are stored in rowid and target genes (TGs) are stored in colid. In this example there are 3105 TGs and 376 TFs. Among these TFs and TGs, there are 92399 edges. The edge weights range from 0.61 to 1. These TF-to-TG connections and edge weights correspond to \(\{{\bf T}_g,v_{gt}\}\) in the RSS-NET model.

Run RSS-NET analysis

3. Submit job arrays

For a given GWAS and a given regulatory network, all RSS-NET analysis tasks are almost identical and they only differs in hyper-parameter values. To exploit this, we run one RSS-NET analysis as a job array with multiple tasks that run in parallel.

To this end, we write a simple sbatch script ibd2015_nkcell.sbatch, and submit it to a cluster with Slurm available.

After the submission, multiple jobs should run in different nodes simultaneously.

For each task of this job array, we request 1 node with 8 CPUs and 32 Gb total memory and set the maximum job wall-clock time as 12.5 hours. The actual memory utilized per task is 26.58 GB (efficiency: 85.06% of 31.25 GB). The actual running time per task ranges from xx to xx, with median being xx.

Each task of the job array outputs results in a Version 7 MAT-file. Each MAT-file contains variational estimates for a given set of hyper-parameter values. For example, the following MAT-file ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_out_66.mat stores RSS-NET results based on the 66-th row of hyper-parameter data frame.

Here [alpha mu s] correspond to the optimal variational parameters \(\{\alpha_j^\star,\mu_j^\star,(\tau_j^\star)^2\}\) for the given hyper-parameters, logw corresponds to the variational lower bound \(F^\star\) and [theta0 theta sigb sige] corresponds to \(\{\theta_0,\theta,\sigma_0,\sigma\}\). Please see Supplementary Notes of Zhu et al (2020) for definitions.

Summarize RSS-NET results

For this example, simply run the following line in a Matlab console. After running summarize_ibd2015_nkcell.m network-level enrichment results are stored in a MAT-file ${gwas}_${net}_${cis}_results_model.mat. and locus-level association results are stored in a MAT-file ${gwas}_${net}_${cis}_results_gene.mat.

1. ${gwas}_${net}_${cis}_results_model.mat: network-level enrichments

To assess whether a regulatory network is enriched for genetic associations with a trait, we evaluate a Bayes factor (BF) comparing the baseline model (\(M_0:\theta=0~\text{and}~\sigma^2=0\)) in RSS-NET with an enrichment model.

Here log10_bf* are log 10 BFs comparing the flollowing 4 enrichment models against \(M_0\).

  • log10_bf: \(M_1:\theta>0~\text{or}~\sigma^2>0\);
  • log10_bf_ns: \(M_{11}:\theta>0~\text{and}~\sigma^2=0\);
  • log10_bf_nt: \(M_{12}:\theta=0~\text{and}~\sigma^2>0\);
  • log10_bf_ts: \(M_{13}:\theta>0~\text{and}~\sigma^2>0\).

Because \(M_1\) is more flexible than other models, we mainly use log10_bf as recommended by Zhu et al (2020).

2. ${gwas}_${net}_${cis}_results_gene.mat: locus-level associations

To summarize association between a locus and a trait, we compute \(P_1^{\sf net}\), the posterior probability that at least one SNP \(j\) in the locus is associated with the trait (\(\beta_j\neq 0\)): \[ P_1^{\sf net}=1-\Pr(\beta_j=0,~\forall j\in\text{locus}~|~\text{data},M_1). \] As in Zhu et al (2020), a locus is defined as the transcribed region of a gene plus 100 kb uptream and downstream. The locus definition is provided in ibd2015_gene_grch37.mat.

Here P1_gene corresponds to \(P_1^{\sf net}\) and [gene_chr gene_start gene_stop] denote physicial position of genes based on GRCh37.

More examples

The RSS-NET analyses of 18 complex traits and 38 gene regulatory networks reported in Zhu et al (2020) are essentially 684 modified rerun of the example above (with different input GWAS and/or network data, and different hyper-parameter grids). Our full analysis results are publicly available at https://xiangzhu.github.io/rss-peca.

Appendix


─ Session info ───────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.3 (2020-02-29)
 os       Ubuntu 18.04.4 LTS          
 system   x86_64, linux-gnu           
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/Los_Angeles         
 date     2020-03-04                  

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date       lib source        
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.0)
 backports     1.1.5   2019-10-02 [1] CRAN (R 3.6.1)
 callr         3.4.1   2020-01-24 [1] CRAN (R 3.6.2)
 cli           2.0.1   2020-01-08 [1] CRAN (R 3.6.2)
 crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.0)
 desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.0)
 devtools      2.2.1   2019-09-24 [1] CRAN (R 3.6.1)
 digest        0.6.23  2019-11-23 [1] CRAN (R 3.6.1)
 ellipsis      0.3.0   2019-09-20 [1] CRAN (R 3.6.1)
 evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.0)
 fansi         0.4.1   2020-01-08 [1] CRAN (R 3.6.2)
 fs            1.3.1   2019-05-06 [1] CRAN (R 3.6.0)
 git2r         0.26.1  2019-06-29 [1] CRAN (R 3.6.0)
 glue          1.3.1   2019-03-12 [1] CRAN (R 3.6.0)
 htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.1)
 httpuv        1.5.2   2019-09-11 [1] CRAN (R 3.6.1)
 knitr         1.28    2020-02-06 [1] CRAN (R 3.6.2)
 later         1.0.0   2019-10-04 [1] CRAN (R 3.6.1)
 magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.0)
 memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.0)
 pkgbuild      1.0.6   2019-10-09 [1] CRAN (R 3.6.1)
 pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.6.0)
 prettyunits   1.1.1   2020-01-24 [1] CRAN (R 3.6.2)
 processx      3.4.1   2019-07-18 [1] CRAN (R 3.6.1)
 promises      1.1.0   2019-10-04 [1] CRAN (R 3.6.1)
 ps            1.3.0   2018-12-21 [1] CRAN (R 3.6.0)
 R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.1)
 Rcpp          1.0.3   2019-11-08 [1] CRAN (R 3.6.1)
 remotes       2.1.0   2019-06-24 [1] CRAN (R 3.6.0)
 rlang         0.4.4   2020-01-28 [1] CRAN (R 3.6.2)
 rmarkdown     2.1     2020-01-20 [1] CRAN (R 3.6.2)
 rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.0)
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.0)
 stringi       1.4.5   2020-01-11 [1] CRAN (R 3.6.2)
 stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.0)
 testthat      2.3.1   2019-12-01 [1] CRAN (R 3.6.2)
 usethis       1.5.1   2019-07-04 [1] CRAN (R 3.6.1)
 whisker       0.4     2019-08-28 [1] CRAN (R 3.6.1)
 withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.0)
 workflowr   * 1.6.0   2019-12-19 [1] CRAN (R 3.6.2)
 xfun          0.12    2020-01-13 [1] CRAN (R 3.6.2)
 yaml          2.2.1   2020-02-01 [1] CRAN (R 3.6.2)

[1] /home/maimaizhu/R/x86_64-pc-linux-gnu-library/3.6
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library