Last updated: 2019-07-12
Checks: 6 1
Knit directory: cause/
This reproducible R Markdown analysis was created with workflowr (version 1.4.0.9000). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
The global environment had objects present when the code in the R Markdown file was run. These objects can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment. Use wflow_publish
or wflow_build
to ensure that the code is always run in an empty environment.
The following objects were defined in the global environment when these results were created:
Name | Class | Size |
---|---|---|
data | environment | 56 bytes |
env | environment | 56 bytes |
The command set.seed(20181014)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Untracked files:
Untracked: (tumble-track's conflicted copy 2019-07-09).Rhistory
Untracked: DESCRIPTION (tumble-track's conflicted copy 2019-07-09)
Untracked: analysis/figure/
Untracked: docs/cause.bib
Untracked: docs/figure/cause_figure_1_standalone.pdf
Untracked: example_data/chr22_AF0.05_0.1.RDS
Untracked: example_data/chr22_AF0.05_snpdata.RDS
Untracked: gwas_data/
Untracked: ll_v7_notes.Rmd
Untracked: sim_results/
Untracked: src/RcppExports.o
Untracked: src/log_likelihood_functions.o
Untracked: temp.txt
Unstaged changes:
Modified: analysis/simulations.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view them.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 8fbc50c | Jean Morrison | 2019-07-12 | wflow_publish(“analysis/index.Rmd”) |
html | f3d82d3 | Jean Morrison | 2019-07-12 | Build site. |
Rmd | fdcf760 | Jean Morrison | 2019-07-12 | wflow_publish(“analysis/index.Rmd”) |
html | 09acd92 | Jean Morrison | 2019-07-12 | Build site. |
Rmd | a5de9b2 | Jean Morrison | 2019-07-12 | wflow_publish(“analysis/index.Rmd”) |
html | 5e4c977 | Jean Morrison | 2019-07-10 | Build site. |
Rmd | 237d888 | Jean Morrison | 2019-07-10 | wflow_publish(“analysis/index.Rmd”) |
html | 4a40efa | Jean Morrison | 2019-06-27 | Build site. |
Rmd | 6f137f1 | Jean Morrison | 2019-06-27 | wflow_publish(“analysis/index.Rmd”) |
html | 3b5c7e2 | Jean Morrison | 2019-06-25 | Build site. |
Rmd | a55827d | Jean Morrison | 2019-06-25 | wflow_publish(files = c(“analysis/index.Rmd”, “analysis/ldl_cad.Rmd”)) |
html | cb84ab2 | Jean Morrison | 2019-06-25 | Build site. |
Rmd | a54a556 | Jean Morrison | 2019-06-25 | wflow_publish(files = c(“analysis/index.Rmd”)) |
html | 6b826d9 | Jean Morrison | 2019-06-25 | Build site. |
html | 286f4e9 | Jean Morrison | 2019-06-25 | Build site. |
Rmd | 8f3b82e | Jean Morrison | 2019-06-25 | wflow_publish(files = c(“analysis/about.Rmd”, “analysis/index.Rmd”, “analysis/ldl_cad.Rmd”, “analysis/license.Rmd”, |
Rmd | 24510f1 | Jean Morrison | 2018-10-23 | small language changes |
html | 46b159a | Jean Morrison | 2018-10-17 | Build site. |
Rmd | 19edfa4 | Jean Morrison | 2018-10-17 | wflow_git_commit(“analysis/index.Rmd”) |
html | f15238b | Jean Morrison | 2018-10-15 | Build site. |
Rmd | d163752 | Jean Morrison | 2018-10-15 | update about and index |
Rmd | efc43ef | Jean Morrison | 2018-10-15 | Start workflowr project. |
Here are some useful links
Slides! from WNAR 2019
CAUSE is a Mendelian Randomization method using genome-wide summary statistics. CAUSE models correlated and uncorrelated horizontal pleiotropy in order to avoid false positives that can occur using other methods. Read a short introduction to the method below. In the tabs you can find a tutorial about using our software, some simulation results and an example of a larger data analysi.
Mendelian randomization (MR) is a method for inferring causal effects from observational data by using genetic variants as insturmental variables. The key idea of MR is to treat genotypes as naturally occurring “randomizations” (Smith and Hemani (2014),Boef, Dekkers, and Le Cessie (2015),J. Zheng et al. (2017)). Suppose we are interested in the causal effect of trait \(M\) (for “Mediator”) on trait \(Y\). The simplest MR methods assume that we can identify a genetic variant \(G_j\), that meets two assumptions:
These assumptions are illustrated in the following figure:
Version | Author | Date |
---|---|---|
96a4bf7 | Jean Morrison | 2019-07-12 |
Here we have divided assumption 2 into two parts. First the variant may not affect any confounders/shared factors that act on both \(M\) and \(Y\) (correlated pleiotropy) and second, the variant cannot have any affects on \(Y\) through a non-shared mechanism (uncorrelated pleiotropy).
If these assumptions hold thenthe associations of \(G_j\) with traits \(M\) and \(Y\) will satisfy
\[ \beta_{Y,j} = \gamma \beta_{M,j}, \] where \(\beta_{Y,j}\) is the association of \(G_j\) with \(Y\), \(\beta_{M,j}\) is the association of \(G_j\) with \(M\) and \(\gamma\) is the causal effect of \(M\) on \(Y\). This relationship is the core of simple MR methods. Many methods based on Equation , including the commonly used inverse variance weighted (IVW) regression, first obtain estimates of \(\beta_{Y,j}\) and \(\beta_{M,j}\) for several genetic variants \(G_j\), and then estimate \(\gamma\) by regressing the estimates of \(\beta_{Y,j}\) on the estimates of \(\beta_{M,j}\) (Burgess, Dudbridge, and Thompson (2016)).
Correlated and uncorrelated pleiotropy are both forms of what has been previously termed horizontal pleiotropy. However, they have different effects on MR estimators. In uncorrelated pleiotropy, horizontal pleiotropic effects of variants are uncorrelated with effects on \(M\). Thas adds a noise term to the relationship above. On the other hand, when some variants exhibit correlated pleiotropy, this can induce an average correaltion between effect estimates even when the causal effect is equal to zero. This makes correlated pleiotropy more difficult to account for.
Several proposals have been made for accounting for horizontal pleiotropy in MR. However, most of these rely on the assumption that all horizontal pleiotropy is uncorrelated. These include Egger regression (Bowden, Smith, and Burgess (2015),Barfield et al. (2018)) which adds an intercept term to the regression of \(\hat{\beta}_{Y,j}\) on \(\hat{\beta}_{M,j}\), and several methods that rely on outlier removal including GSMR (Zhu et al. (2018)) and MR-PRESSO (Verbanck et al. (2018)). Another proposal, the weighted median estimator(Bowden et al. (2016)), makes no assumptions about the form of horizontal pleiotropy but assumes that fewer than half the variants exhibit horizontal pleiotropy.
The figures below demonstrate how correlated pleiotropy can lead to false positives using simple MR methods. On the left we have simulated data with no causal effect but 15% of the variants have correlated pleiotropic effecgts on \(Y\) (blue triangles). Even though there are many variants with strong association with \(M\) and no association with \(Y\) (evidence against a causal effect) IVW regression obtains a \(p\)-value of 0.01. We would like to be able to distinguish this scenario from the scenario on the right. On the right, we have simulated data with a true causal effect. Effect estimates are correlated for all varaints.
The purpose of CASUE is to distinguish the patterns created by correlated pleiotropy from those created by a causal effect while still accounting for uncorrelated horizontal pleiotropy. We can also account for sample overlap in the GWAS of traits \(M\) and \(Y\) and we can avoid some other probelems encountered by simple MR by modeling uncertainty in effect sizes rather than pre-selecting variants with strong evidence of affecting trait \(M\).
CAUSE uses a mixture model for variants included in the analysis. We assume that a proportion \(q\) of variants exhibit correlated pleiotropy (the blue triangles in the figure above) while the rest do not. Correlated pleiotropic variants have the causal diagram on the right in the figure below. The remaining variants have the causal diagram on the left.
Version | Author | Date |
---|---|---|
81d1841 | Jean Morrison | 2019-07-12 |
Note that we allow all variants to have uncorrelated pleiotropic effects (\(\theta_j\)). To make the model identifiable and to encourage parsimoniuos solutions, we assume that \(q\) is small. This assumption is encoded in a prior on \(q\) that places most of its weight on small values (\(q\sim~\)Beta\((1, 10)\) by default). If \(Z_j\) is an indicator that variant \(G_j\) is a correlated pleiotropic variant then the model above implies that
\[ \beta_{Y,j} = \underbrace{\gamma \beta_{M,j}}_{\text{causal effect}} + \underbrace{Z_j \eta \beta_{M,j}}_{\substack{\text{correlated}\\ \text{pleiotropy}}} + \underbrace{\theta_j}_{\substack{\text{uncorrelated}\\ \text{pleiotropy}}}. \] This relationship is the core idea of CAUSE. We estimate posterior distributions of \(\gamma\), \(\eta\) and \(q\) and compare the fit of posteriors from models with and without a causal effect. Our software provides posterior distribution estimates under two models: The sharing model in which \(\gamma\) is fixed at 0 and the causal model, which allows \(\gamma\) to be a free parameter. It also provides a test that the posteriors estimated under the causal model fit the data significantly better than posteriors estimated under the sharing model. If this is the case, we conclude that the data are consistent with a causal effect, or in other words, show the pattern of the right hand scatter plot above. For more details on model fitting and comparison, check out the paper! For details on running the software, look at the tutorial tab above.
Barfield, Richard, Helian Feng, Alexander Gusev, Lang Wu, and Wei Zheng. 2018. “Transcriptome-wide association studies accounting for co- localization using Egger regression.” Genetic Epidemiology 42: 418–33.
Boef, Anna G.C., Olaf M. Dekkers, and Saskia Le Cessie. 2015. “Mendelian randomization studies: A review of the approaches used and the quality of reporting.” International Journal of Epidemiology 44 (2): 496–511. doi:10.1093/ije/dyv071.
Bowden, Jack, M. Fabiola Del Greco, Cosetta Minelli, George Davey Smith, Nuala A. Sheehan, and John R. Thompson. 2016. “Assessing the suitability of summary data for two-sample mendelian randomization analyses using MR-Egger regression: The role of the I2statistic.” International Journal of Epidemiology 45 (6): 1961–74. doi:10.1093/ije/dyw220.
Bowden, Jack, George Davey Smith, and Stephen Burgess. 2015. “Mendelian Randomization Methodology Mendelian randomization with invalid instruments : effect estimation and bias detection through Egger regression.” International Journal of Epidemiology, no. June: 512–25. doi:10.1093/ije/dyv080.
Burgess, Stephen, Frank Dudbridge, and Simon G. Thompson. 2016. “Combining information on multiple instrumental variables in Mendelian randomization: Comparison of allele score and summarized data methods.” Statistics in Medicine 35 (11): 1880–1906. doi:10.1002/sim.6835.
Smith, George Davey, and Gibran Hemani. 2014. “Mendelian randomization: genetic anchors for causal inference in epidemiological studies.” Human Molecular Genetics 23 (1): 89–98. doi:10.1093/hmg/ddu328.
Verbanck, Marie, Chia-yen Chen, Benjamin Neale, and Ron Do. 2018. “Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases.” Nature Genetics 50 (May). Springer US. doi:10.1038/s41588-018-0099-7.
Zheng, Jie, Denis Baird, Maria-Carolina Borges, Jack Bowden, Gibran Hemani, Philip Haycock, David M. Evans, and George Davey Smith. 2017. “Recent Developments in Mendelian Randomization Studies.” Current Epidemiology Reports 4 (4). Current Epidemiology Reports: 330–45. doi:10.1007/s40471-017-0128-6.
Zhu, Zhihong, Zhili Zheng, Futao Zhang, Yang Wu, Maciej Trzaskowski, Robert Maier, Matthew R. Robinson, et al. 2018. “Causal associations between risk factors and common diseases inferred from GWAS summary data.” Nature Communications 9 (1). Springer US: 224. doi:10.1038/s41467-017-02317-2.
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] workflowr_1.4.0.9000 Rcpp_1.0.1 digest_0.6.20
[4] rprojroot_1.3-2 backports_1.1.4 git2r_0.26.1
[7] magrittr_1.5 evaluate_0.14 highr_0.8
[10] stringi_1.4.3 fs_1.3.1 whisker_0.3-2
[13] rmarkdown_1.13 tools_3.6.1 stringr_1.4.0
[16] glue_1.3.1 xfun_0.8 yaml_2.2.0
[19] compiler_3.6.1 htmltools_0.3.6 knitr_1.23