Last updated: 2020-07-24

Checks: 6 1

Knit directory: causal-TWAS/

This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20191103)

The command set.seed(20191103) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 89b90ad

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    code/workflow/.ipynb_checkpoints/
    Ignored:    data/

Untracked files:
    Untracked:  analysis/simulation-multi-ukbchr17to22-gtex.adipose.Rmd

Unstaged changes:
    Modified:   analysis/index.Rmd
    Deleted:    analysis/simulation-multi-ukbchr17:22-gtex.adipose.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.

Run simulation 8 times for ukb chr 17 to chr 22 combined.

library(mr.ash.alpha)
library(data.table)
suppressMessages({library(plotly)})
library(tidyr)
library(plyr)

simdatadir <- "~/causalTWAS/simulations/simulation_ashtest_20200721/"
outputdir <- "~/causalTWAS/simulations/simulation_ashtest_20200721/"
susiedir <- "~/causalTWAS/simulations/simulation_susietest_20200721/"
tags <- paste0('20200721-1-', c(2,4:9))
tag2s <- c('zeroes-es', 'zerose-es', 'lassoes-es','lassoes-se')

get_files <- function(tag, tag2){
  par <- paste0(outputdir, tag, "-mr.ash2s.", tag2, ".param.txt")
  rpip <- paste0(outputdir, tag, "-mr.ash2s.", tag2, ".rPIP.txt")
  
  gmrash <- paste0(outputdir, tag, "-mr.ash2s.", tag2, ".expr.txt")
  smrash <- paste0(outputdir, tag, "-mr.ash2s.", tag2, ".snp.txt")   
  
  ggwas <- paste0(outputdir, tag, ".exprgwas.txt.gz")
  sgwas <- paste0(outputdir, tag, ".snpgwas.txt.gz")
  
  gsusie <- paste0(susiedir, tag, ".", tag2, ".L3.susieres.expr.txt")
  ssusie <- paste0(susiedir, tag, ".", tag2, ".L3.susieres.snp.txt")
  
  return(tibble::lst(par, rpip, gmrash, ggwas, smrash, sgwas, gsusie, ssusie))
}

Mr.ash2 parameter estimation

Results for 9 simulations runs, using different initiate and update strategy

show_param <- function(tag2){
  f <- lapply(tags, get_files, tag2 = tag2)
  parf <- lapply(f, '[[', "par")
  param <- do.call(rbind, lapply(parf, function(x) t(read.table(x))[2:1,]))
  knitr::kable(param)
}

NULL; expr-snp; expr-snp

show_param(tag2s[1])

	gene.pi1	gene.pve	snp.pi1	snp.pve
truth	0.0502117	0.0091963	0.0024981	0.0437056
estimated	0.0038203	0.0077097	0.0004988	0.0493777
truth	0.0502117	0.0114605	0.0024981	0.0548585
estimated	0.0127840	0.0251959	0.0004663	0.0461038
truth	0.0502117	0.0110859	0.0024981	0.0478479
estimated	0.0128614	0.0250191	0.0002822	0.0281643
truth	0.0502117	0.0097170	0.0024981	0.0580372
estimated	0.0097879	0.0194024	0.0004053	0.0403293
truth	0.0502117	0.0111216	0.0024981	0.0491958
estimated	0.0118048	0.0235006	0.0005047	0.0501203
truth	0.0502117	0.0110024	0.0024981	0.0477211
estimated	0.0116464	0.0226369	0.0003257	0.0322025
truth	0.0502117	0.0114627	0.0024981	0.0513712
estimated	0.0068140	0.0136486	0.0003761	0.0377470

NULL; snp-expr; expr-snp

show_param(tag2s[2])

	gene.pi1	gene.pve	snp.pi1	snp.pve
truth	0.0502117	0.0091963	0.0024981	0.0437056
estimated	0.0038203	0.0077097	0.0004988	0.0493777
truth	0.0502117	0.0114605	0.0024981	0.0548585
estimated	0.0121765	0.0240500	0.0005029	0.0495821
truth	0.0502117	0.0110859	0.0024981	0.0478479
estimated	0.0128614	0.0250191	0.0002822	0.0281643
truth	0.0502117	0.0097170	0.0024981	0.0580372
estimated	0.0097879	0.0194024	0.0004053	0.0403293
truth	0.0502117	0.0111216	0.0024981	0.0491958
estimated	0.0118048	0.0235006	0.0005047	0.0501203
truth	0.0502117	0.0110024	0.0024981	0.0477211
estimated	0.0116464	0.0226369	0.0003257	0.0322025
truth	0.0502117	0.0114627	0.0024981	0.0513712
estimated	0.0068140	0.0136486	0.0003761	0.0377470

lasso; expr-snp; expr-snp

show_param(tag2s[3])

	gene.pi1	gene.pve	snp.pi1	snp.pve
truth	0.0502117	0.0091963	0.0024981	0.0437056
estimated	0.0025035	0.0050747	0.0005317	0.0525218
truth	0.0502117	0.0114605	0.0024981	0.0548585
estimated	0.0096839	0.0192560	0.0005430	0.0533743
truth	0.0502117	0.0110859	0.0024981	0.0478479
estimated	0.0125445	0.0244517	0.0002905	0.0290015
truth	0.0502117	0.0097170	0.0024981	0.0580372
estimated	0.0053686	0.0107749	0.0005126	0.0505513
truth	0.0502117	0.0111216	0.0024981	0.0491958
estimated	0.0110461	0.0220645	0.0005172	0.0513885
truth	0.0502117	0.0110024	0.0024981	0.0477211
estimated	0.0068421	0.0134577	0.0004223	0.0413592
truth	0.0502117	0.0114627	0.0024981	0.0513712
estimated	0.0061398	0.0123350	0.0003996	0.0400672

lasso; expr-snp; snp-expr

show_param(tag2s[4])

	gene.pi1	gene.pve	snp.pi1	snp.pve
truth	0.0502117	0.0091963	0.0024981	0.0437056
estimated	0.0025035	0.0050747	0.0005317	0.0525218
truth	0.0502117	0.0114605	0.0024981	0.0548585
estimated	0.0097114	0.0193093	0.0005429	0.0533621
truth	0.0502117	0.0110859	0.0024981	0.0478479
estimated	0.0125458	0.0244540	0.0002905	0.0290008
truth	0.0502117	0.0097170	0.0024981	0.0580372
estimated	0.0053687	0.0107750	0.0005126	0.0505513
truth	0.0502117	0.0111216	0.0024981	0.0491958
estimated	0.0110466	0.0220653	0.0005172	0.0513885
truth	0.0502117	0.0110024	0.0024981	0.0477211
estimated	0.0068546	0.0134818	0.0004221	0.0413370
truth	0.0502117	0.0114627	0.0024981	0.0513712
estimated	0.0061805	0.0124158	0.0003996	0.0400624

Regional mr.ash2s PIP overview

Take simulation 1 (NULL; expr-snp; expr-snp) as examples. We use region size 500kb and PIP cut off at 0.5 for SUSIE.

f <- get_files(tag= tags[1], tag2 = tag2s[1])
a <- read.table(f[["rpip"]], header = T)
plot(a$p0, a$rPIP, pch =19, col ='salmon', xlab = "position", ylab= "Sum of PIP")
grid()

PIP scatter plot

mr.ash2s PIP vs. susie PIP.

scatter_plot_PIP<- function(tag2){
  f <- lapply(tags, get_files, tag2 = tag2)
  mrashf <- lapply(f, '[[', "gmrash")
  names(mrashf) <- tags
  
  susief <- lapply(f, '[[', "gsusie")
  names(susief) <- tags

  .tagname <- function(x, flist){
    a <- read.table(flist[[x]], header =T)
    a[, "name"] <- paste0(x, ":", a[, "name"])
    a
  }
  mrashres <- do.call(rbind, lapply(tags, .tagname, flist = mrashf))
  susieres <- do.call(rbind, lapply(tags, .tagname, flist = susief))
 
  res <- merge(mrashres, susieres, by = "name", all = T)
  
  res <- res[complete.cases(res),]
  res <- rename(res, c("PIP" = "mr.ash_PIP", "pip" = "SUSIE_PIP", "pip.null" = "SUSIE_PIP_null") )
  res$ifcausal <- mapvalues(res$ifcausal, 
          from=c(0,1), 
          to=c("Non causal", "Causal"))
  
  fig1 <- plot_ly(data = res, x = ~ mr.ash_PIP, y = ~ SUSIE_PIP, color = ~ ifcausal, 
                 colors = c( "salmon", "darkgreen"))
  
  fig2 <- plot_ly(data = res, x = ~ mr.ash_PIP, y = ~ SUSIE_PIP_null, color = ~ ifcausal, 
                 colors = c( "salmon", "darkgreen"))
  
  fig <- subplot(fig1, fig2, titleX = TRUE, titleY = T, margin = 0.1)
  fig
}

NULL; expr-snp; expr-snp

scatter_plot_PIP(tag2s[1])

Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
Please use `arrange()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.

NULL; snp-expr; expr-snp

scatter_plot_PIP(tag2s[2])

lasso; expr-snp; expr-snp

scatter_plot_PIP(tag2s[3])

lasso; expr-snp; snp-expr

scatter_plot_PIP(tag2s[4])

ROC curve

ROC_plot<- function(tag2){
  f <- lapply(tags, get_files, tag2 = tag2)
  mrashf <- lapply(f, '[[', "gmrash")
  names(mrashf) <- tags
  
  susief <- lapply(f, '[[', "gsusie")
  names(susief) <- tags
  
  gwasf <- lapply(f, '[[', "ggwas")
  names(gwasf) <- tags

  .tagname <- function(x, flist, colnames = NULL){
    a <- read.table(flist[[x]], header =T)
    if (!is.null(colnames)){
      colnames(a) <- colnames
    }
    a[, "name"] <- paste0(x, ":", a[, "name"])
    a
  }
  mrashres <- do.call(rbind, lapply(tags, .tagname, flist = mrashf))
  susieres <- do.call(rbind, lapply(tags, .tagname, flist = susief))
  gwasres <- do.call(rbind, lapply(tags, .tagname, flist = gwasf, 
                                   colnames =  c("chr", "p0",   "p1", "name", "Estimate", "Std.Error", "t-value", "PVALUE")))

  res <- merge(mrashres, susieres, by = "name", all = T)
  res <- merge(res, gwasres, by = "name", all = T)
  
  res <- res[complete.cases(res),]
  res <- rename(res, c("PIP" = "mr.ash", "pip" = "SUSIE", "PVALUE" = "TWAS") )
  res[,"TWAS"] <- -log10(res[, "TWAS"])
  
  roccolors <-  c("red", "green", "blue")
  methods <- c("mr.ash", "SUSIE", "TWAS")
  plot(0, xlim=c(0,1), ylim=c(0,1), col="white", xlab = "FPR", ylab = "TPR")
  for (i in 1:3){
    method <- methods[i]
    bordered <- res[order(res[,method]),] 
    actuals <- bordered$ifcausal == 1
    sens <- (sum(actuals) - cumsum(actuals))/sum(actuals)
    spec <- cumsum(!actuals)/sum(!actuals)
    lines(1 - spec, sens, type = "l", col = roccolors[i])
    abline(c(0,0),c(1,1))
    auc <- sum(spec*diff(c(0, 1 - sens)))
    cat("AUC for ", method, ": ", auc)
  }
  legend(0.6,0.3, legend= methods, col=roccolors, lty=1, cex=0.8)
  grid()
}

NULL; expr-snp; expr-snp

ROC_plot(tag2s[1])

AUC for  mr.ash :  0.8102564AUC for  SUSIE :  0.823046AUC for  TWAS :  0.8002471

NULL; snp-expr; expr-snp

ROC_plot(tag2s[2])

AUC for  mr.ash :  0.80985AUC for  SUSIE :  0.8147874AUC for  TWAS :  0.811609

lasso; expr-snp; expr-snp

ROC_plot(tag2s[3])

AUC for  mr.ash :  0.77492AUC for  SUSIE :  0.8077692AUC for  TWAS :  0.8109009

lasso; expr-snp; snp-expr

ROC_plot(tag2s[4])

AUC for  mr.ash :  0.7749867AUC for  SUSIE :  0.8076692AUC for  TWAS :  0.8109009

sessionInfo()

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] plyr_1.8.6          tidyr_0.8.3         plotly_4.9.2.9000  
[4] ggplot2_3.3.1       data.table_1.12.7   mr.ash.alpha_0.1-34

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6      highr_0.7         compiler_3.5.1   
 [4] pillar_1.4.4      later_0.7.5       git2r_0.26.1     
 [7] workflowr_1.6.0   tools_3.5.1       digest_0.6.25    
[10] viridisLite_0.3.0 jsonlite_1.6.1    evaluate_0.12    
[13] tibble_3.0.1      lifecycle_0.2.0   gtable_0.2.0     
[16] lattice_0.20-38   pkgconfig_2.0.2   rlang_0.4.6      
[19] Matrix_1.2-15     shiny_1.2.0       crosstalk_1.0.0  
[22] yaml_2.2.0        httr_1.4.1        withr_2.1.2      
[25] stringr_1.4.0     dplyr_1.0.0       knitr_1.20       
[28] htmlwidgets_1.3   generics_0.0.2    fs_1.3.1         
[31] vctrs_0.3.1       tidyselect_1.1.0  rprojroot_1.3-2  
[34] grid_3.5.1        glue_1.4.1        R6_2.3.0         
[37] rmarkdown_1.10    purrr_0.3.4       magrittr_1.5     
[40] backports_1.1.2   scales_1.0.0      promises_1.0.1   
[43] htmltools_0.3.6   ellipsis_0.3.1    xtable_1.8-3     
[46] mime_0.6          colorspace_1.3-2  httpuv_1.4.5     
[49] stringi_1.3.1     lazyeval_0.2.1    munsell_0.5.0    
[52] crayon_1.3.4

Summarize twas

Mr.ash2 parameter estimation

NULL; expr-snp; expr-snp

NULL; snp-expr; expr-snp

lasso; expr-snp; expr-snp

lasso; expr-snp; snp-expr

Regional mr.ash2s PIP overview

PIP scatter plot

NULL; expr-snp; expr-snp

NULL; snp-expr; expr-snp

lasso; expr-snp; expr-snp

lasso; expr-snp; snp-expr

ROC curve

NULL; expr-snp; expr-snp

NULL; snp-expr; expr-snp

lasso; expr-snp; expr-snp

lasso; expr-snp; snp-expr