Last updated: 2019-07-12

Checks: 6 1

Knit directory: cause/

This reproducible R Markdown analysis was created with workflowr (version 1.4.0.9000). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

The global environment had objects present when the code in the R Markdown file was run. These objects can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment. Use wflow_publish or wflow_build to ensure that the code is always run in an empty environment.

The following objects were defined in the global environment when these results were created:

Name Class Size
data environment 56 bytes
env environment 56 bytes

The command set.seed(20181014) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Untracked files:
    Untracked:   (tumble-track's conflicted copy 2019-07-09).Rhistory
    Untracked:  DESCRIPTION (tumble-track's conflicted copy 2019-07-09)
    Untracked:  analysis/figure/
    Untracked:  docs/cause.bib
    Untracked:  docs/figure/cause_figure_1_standalone.pdf
    Untracked:  example_data/chr22_AF0.05_0.1.RDS
    Untracked:  example_data/chr22_AF0.05_snpdata.RDS
    Untracked:  gwas_data/
    Untracked:  ll_v7_notes.Rmd
    Untracked:  sim_results/
    Untracked:  src/RcppExports.o
    Untracked:  src/log_likelihood_functions.o
    Untracked:  temp.txt

Unstaged changes:
    Modified:   analysis/simulations.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd 8fbc50c Jean Morrison 2019-07-12 wflow_publish(“analysis/index.Rmd”)
html f3d82d3 Jean Morrison 2019-07-12 Build site.
Rmd fdcf760 Jean Morrison 2019-07-12 wflow_publish(“analysis/index.Rmd”)
html 09acd92 Jean Morrison 2019-07-12 Build site.
Rmd a5de9b2 Jean Morrison 2019-07-12 wflow_publish(“analysis/index.Rmd”)
html 5e4c977 Jean Morrison 2019-07-10 Build site.
Rmd 237d888 Jean Morrison 2019-07-10 wflow_publish(“analysis/index.Rmd”)
html 4a40efa Jean Morrison 2019-06-27 Build site.
Rmd 6f137f1 Jean Morrison 2019-06-27 wflow_publish(“analysis/index.Rmd”)
html 3b5c7e2 Jean Morrison 2019-06-25 Build site.
Rmd a55827d Jean Morrison 2019-06-25 wflow_publish(files = c(“analysis/index.Rmd”, “analysis/ldl_cad.Rmd”))
html cb84ab2 Jean Morrison 2019-06-25 Build site.
Rmd a54a556 Jean Morrison 2019-06-25 wflow_publish(files = c(“analysis/index.Rmd”))
html 6b826d9 Jean Morrison 2019-06-25 Build site.
html 286f4e9 Jean Morrison 2019-06-25 Build site.
Rmd 8f3b82e Jean Morrison 2019-06-25 wflow_publish(files = c(“analysis/about.Rmd”, “analysis/index.Rmd”, “analysis/ldl_cad.Rmd”, “analysis/license.Rmd”,
Rmd 24510f1 Jean Morrison 2018-10-23 small language changes
html 46b159a Jean Morrison 2018-10-17 Build site.
Rmd 19edfa4 Jean Morrison 2018-10-17 wflow_git_commit(“analysis/index.Rmd”)
html f15238b Jean Morrison 2018-10-15 Build site.
Rmd d163752 Jean Morrison 2018-10-15 update about and index
Rmd efc43ef Jean Morrison 2018-10-15 Start workflowr project.

Weclome to the CAUSE website!

Here are some useful links

Pre-print

R package

Slides! from WNAR 2019

CAUSE is a Mendelian Randomization method using genome-wide summary statistics. CAUSE models correlated and uncorrelated horizontal pleiotropy in order to avoid false positives that can occur using other methods. Read a short introduction to the method below. In the tabs you can find a tutorial about using our software, some simulation results and an example of a larger data analysi.

Introduction to CAUSE

Summary Statistic Mendelian Randomizatoin Basics

Mendelian randomization (MR) is a method for inferring causal effects from observational data by using genetic variants as insturmental variables. The key idea of MR is to treat genotypes as naturally occurring “randomizations” (Smith and Hemani (2014),Boef, Dekkers, and Le Cessie (2015),J. Zheng et al. (2017)). Suppose we are interested in the causal effect of trait \(M\) (for “Mediator”) on trait \(Y\). The simplest MR methods assume that we can identify a genetic variant \(G_j\), that meets two assumptions:

  1. \(G_j\) causally affects \(M\)
  2. \(G_j\) has no affects on \(Y\) that are not mediated through \(M\)

These assumptions are illustrated in the following figure:

Version Author Date
96a4bf7 Jean Morrison 2019-07-12

Here we have divided assumption 2 into two parts. First the variant may not affect any confounders/shared factors that act on both \(M\) and \(Y\) (correlated pleiotropy) and second, the variant cannot have any affects on \(Y\) through a non-shared mechanism (uncorrelated pleiotropy).

If these assumptions hold thenthe associations of \(G_j\) with traits \(M\) and \(Y\) will satisfy

\[ \beta_{Y,j} = \gamma \beta_{M,j}, \] where \(\beta_{Y,j}\) is the association of \(G_j\) with \(Y\), \(\beta_{M,j}\) is the association of \(G_j\) with \(M\) and \(\gamma\) is the causal effect of \(M\) on \(Y\). This relationship is the core of simple MR methods. Many methods based on Equation , including the commonly used inverse variance weighted (IVW) regression, first obtain estimates of \(\beta_{Y,j}\) and \(\beta_{M,j}\) for several genetic variants \(G_j\), and then estimate \(\gamma\) by regressing the estimates of \(\beta_{Y,j}\) on the estimates of \(\beta_{M,j}\) (Burgess, Dudbridge, and Thompson (2016)).

Violating MR Assumptions

Correlated and uncorrelated pleiotropy are both forms of what has been previously termed horizontal pleiotropy. However, they have different effects on MR estimators. In uncorrelated pleiotropy, horizontal pleiotropic effects of variants are uncorrelated with effects on \(M\). Thas adds a noise term to the relationship above. On the other hand, when some variants exhibit correlated pleiotropy, this can induce an average correaltion between effect estimates even when the causal effect is equal to zero. This makes correlated pleiotropy more difficult to account for.

Several proposals have been made for accounting for horizontal pleiotropy in MR. However, most of these rely on the assumption that all horizontal pleiotropy is uncorrelated. These include Egger regression (Bowden, Smith, and Burgess (2015),Barfield et al. (2018)) which adds an intercept term to the regression of \(\hat{\beta}_{Y,j}\) on \(\hat{\beta}_{M,j}\), and several methods that rely on outlier removal including GSMR (Zhu et al. (2018)) and MR-PRESSO (Verbanck et al. (2018)). Another proposal, the weighted median estimator(Bowden et al. (2016)), makes no assumptions about the form of horizontal pleiotropy but assumes that fewer than half the variants exhibit horizontal pleiotropy.

The figures below demonstrate how correlated pleiotropy can lead to false positives using simple MR methods. On the left we have simulated data with no causal effect but 15% of the variants have correlated pleiotropic effecgts on \(Y\) (blue triangles). Even though there are many variants with strong association with \(M\) and no association with \(Y\) (evidence against a causal effect) IVW regression obtains a \(p\)-value of 0.01. We would like to be able to distinguish this scenario from the scenario on the right. On the right, we have simulated data with a true causal effect. Effect estimates are correlated for all varaints.

The purpose of CASUE is to distinguish the patterns created by correlated pleiotropy from those created by a causal effect while still accounting for uncorrelated horizontal pleiotropy. We can also account for sample overlap in the GWAS of traits \(M\) and \(Y\) and we can avoid some other probelems encountered by simple MR by modeling uncertainty in effect sizes rather than pre-selecting variants with strong evidence of affecting trait \(M\).

CAUSE Model overview

CAUSE uses a mixture model for variants included in the analysis. We assume that a proportion \(q\) of variants exhibit correlated pleiotropy (the blue triangles in the figure above) while the rest do not. Correlated pleiotropic variants have the causal diagram on the right in the figure below. The remaining variants have the causal diagram on the left.

Version Author Date
81d1841 Jean Morrison 2019-07-12

Note that we allow all variants to have uncorrelated pleiotropic effects (\(\theta_j\)). To make the model identifiable and to encourage parsimoniuos solutions, we assume that \(q\) is small. This assumption is encoded in a prior on \(q\) that places most of its weight on small values (\(q\sim~\)Beta\((1, 10)\) by default). If \(Z_j\) is an indicator that variant \(G_j\) is a correlated pleiotropic variant then the model above implies that

\[ \beta_{Y,j} = \underbrace{\gamma \beta_{M,j}}_{\text{causal effect}} + \underbrace{Z_j \eta \beta_{M,j}}_{\substack{\text{correlated}\\ \text{pleiotropy}}} + \underbrace{\theta_j}_{\substack{\text{uncorrelated}\\ \text{pleiotropy}}}. \] This relationship is the core idea of CAUSE. We estimate posterior distributions of \(\gamma\), \(\eta\) and \(q\) and compare the fit of posteriors from models with and without a causal effect. Our software provides posterior distribution estimates under two models: The sharing model in which \(\gamma\) is fixed at 0 and the causal model, which allows \(\gamma\) to be a free parameter. It also provides a test that the posteriors estimated under the causal model fit the data significantly better than posteriors estimated under the sharing model. If this is the case, we conclude that the data are consistent with a causal effect, or in other words, show the pattern of the right hand scatter plot above. For more details on model fitting and comparison, check out the paper! For details on running the software, look at the tutorial tab above.

References

Barfield, Richard, Helian Feng, Alexander Gusev, Lang Wu, and Wei Zheng. 2018. “Transcriptome-wide association studies accounting for co- localization using Egger regression.” Genetic Epidemiology 42: 418–33.

Boef, Anna G.C., Olaf M. Dekkers, and Saskia Le Cessie. 2015. “Mendelian randomization studies: A review of the approaches used and the quality of reporting.” International Journal of Epidemiology 44 (2): 496–511. doi:10.1093/ije/dyv071.

Bowden, Jack, M. Fabiola Del Greco, Cosetta Minelli, George Davey Smith, Nuala A. Sheehan, and John R. Thompson. 2016. “Assessing the suitability of summary data for two-sample mendelian randomization analyses using MR-Egger regression: The role of the I2statistic.” International Journal of Epidemiology 45 (6): 1961–74. doi:10.1093/ije/dyw220.

Bowden, Jack, George Davey Smith, and Stephen Burgess. 2015. “Mendelian Randomization Methodology Mendelian randomization with invalid instruments : effect estimation and bias detection through Egger regression.” International Journal of Epidemiology, no. June: 512–25. doi:10.1093/ije/dyv080.

Burgess, Stephen, Frank Dudbridge, and Simon G. Thompson. 2016. “Combining information on multiple instrumental variables in Mendelian randomization: Comparison of allele score and summarized data methods.” Statistics in Medicine 35 (11): 1880–1906. doi:10.1002/sim.6835.

Smith, George Davey, and Gibran Hemani. 2014. “Mendelian randomization: genetic anchors for causal inference in epidemiological studies.” Human Molecular Genetics 23 (1): 89–98. doi:10.1093/hmg/ddu328.

Verbanck, Marie, Chia-yen Chen, Benjamin Neale, and Ron Do. 2018. “Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases.” Nature Genetics 50 (May). Springer US. doi:10.1038/s41588-018-0099-7.

Zheng, Jie, Denis Baird, Maria-Carolina Borges, Jack Bowden, Gibran Hemani, Philip Haycock, David M. Evans, and George Davey Smith. 2017. “Recent Developments in Mendelian Randomization Studies.” Current Epidemiology Reports 4 (4). Current Epidemiology Reports: 330–45. doi:10.1007/s40471-017-0128-6.

Zhu, Zhihong, Zhili Zheng, Futao Zhang, Yang Wu, Maciej Trzaskowski, Robert Maier, Matthew R. Robinson, et al. 2018. “Causal associations between risk factors and common diseases inferred from GWAS summary data.” Nature Communications 9 (1). Springer US: 224. doi:10.1038/s41467-017-02317-2.


sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] workflowr_1.4.0.9000 Rcpp_1.0.1           digest_0.6.20       
 [4] rprojroot_1.3-2      backports_1.1.4      git2r_0.26.1        
 [7] magrittr_1.5         evaluate_0.14        highr_0.8           
[10] stringi_1.4.3        fs_1.3.1             whisker_0.3-2       
[13] rmarkdown_1.13       tools_3.6.1          stringr_1.4.0       
[16] glue_1.3.1           xfun_0.8             yaml_2.2.0          
[19] compiler_3.6.1       htmltools_0.3.6      knitr_1.23