Install R packages:
install.packages("remotes")
remotes::install_github("MRCIEU/TwoSampleMR")
Note: you likely will need R >= 4.2 to install this package. If you don’t have R version 4.2 or later, you can try with POSIT.
We will follow the TwoSampleMR tutorial to perform a basic MR analysis.
In this example, we will perform MR of BMI against coronary heart disease. To do so we need to identify the SNPs that influence the BMI (get instruments), and then extract those SNPs from a GWAS on coronary heart disease (effects of instruments on outcome), then perform MR.
library(TwoSampleMR)
## TwoSampleMR version 0.5.9
## [>] New: Option to use non-European LD reference panels for clumping etc
## [>] Some studies temporarily quarantined to verify effect allele
## [>] See news(package='TwoSampleMR') and https://gwas.mrcieu.ac.uk for further details
First read in data for exposure:
bmi2_file <- system.file("extdata/bmi.csv", package = "TwoSampleMR")
bmi_exp_dat <- read_exposure_data(
filename = bmi2_file,
sep = ",",
snp_col = "rsid",
beta_col = "effect",
se_col = "SE",
effect_allele_col = "a1",
other_allele_col = "a2",
eaf_col = "a1_freq",
pval_col = "p-value",
units_col = "Units",
gene_col = "Gene",
samplesize_col = "n"
)
## No phenotype name specified, defaulting to 'exposure'.
#> No phenotype name specified, defaulting to 'exposure'.
head(bmi_exp_dat)
## SNP beta.exposure se.exposure effect_allele.exposure
## 1 rs10767664 0.19 0.03061224 A
## 2 rs13078807 0.10 0.02040816 G
## 3 rs1514175 0.07 0.02040816 A
## 4 rs1558902 0.39 0.02040816 A
## 5 rs10968576 0.11 0.02040816 G
## 6 rs2241423 0.13 0.02040816 G
## other_allele.exposure eaf.exposure pval.exposure units.exposure gene.exposure
## 1 T 0.78 5e-26 kg/m2 BDNF
## 2 A 0.20 4e-11 kg/m2 CADM2
## 3 G 0.43 8e-14 kg/m2 TNNI3K
## 4 T 0.42 5e-120 kg/m2 FTO
## 5 A 0.31 3e-13 kg/m2 LRRN6C
## 6 A 0.78 1e-18 kg/m2 LBXCOR1
## samplesize.exposure exposure mr_keep.exposure pval_origin.exposure
## 1 225238 exposure TRUE reported
## 2 221431 exposure TRUE reported
## 3 207641 exposure TRUE reported
## 4 222476 exposure TRUE reported
## 5 247166 exposure TRUE reported
## 6 227886 exposure TRUE reported
## units.exposure_dat id.exposure data_source.exposure
## 1 kg/m2 hDWFkq textfile
## 2 kg/m2 hDWFkq textfile
## 3 kg/m2 hDWFkq textfile
## 4 kg/m2 hDWFkq textfile
## 5 kg/m2 hDWFkq textfile
## 6 kg/m2 hDWFkq textfile
To serve as valid instruments, they need to be independent. Thus we need to do clumping (those SNPs that have LD R-square above the specified threshold only the SNP with the lowest P-value will be retained).
bmi_exp_dat <- clump_data(bmi_exp_dat)
## API: public: http://gwas-api.mrcieu.ac.uk/
## Please look at vignettes for options on running this locally if you need to run many instances of this command.
## Clumping hDWFkq, 30 variants, using EUR population reference
## Removing 3 of 30 variants due to LD with other variants or absence from LD reference panel
This package can also obtain data from thousands of existing GWAS directly. A list of available GWAS
ao <- available_outcomes()
head(ao)
## # A tibble: 6 × 25
## id trait ncase group_name year author consortium sex pmid population
## <chr> <chr> <int> <chr> <int> <chr> <chr> <chr> <int> <chr>
## 1 ieu-b-… Schi… 1234 public 2022 Trube… PGC Male… 3.54e7 Hispanic …
## 2 ieu-b-… Schi… 52017 public 2022 Trube… PGC Male… 3.54e7 European
## 3 ieu-b-… Schi… 12305 public 2022 Trube… PGC Male… 3.54e7 East Asian
## 4 ieu-b-… Schi… 64322 public 2022 Trube… PGC Male… 3.54e7 Mixed
## 5 ieu-b-… Schi… 76755 public 2022 Trube… PGC Male… 3.54e7 Mixed
## 6 ieu-b-… Schi… 5998 public 2022 Trube… PGC Male… 3.54e7 African A…
## # ℹ 15 more variables: unit <chr>, sample_size <int>, build <chr>,
## # ncontrol <int>, category <chr>, subcategory <chr>, ontology <chr>,
## # note <chr>, mr <int>, nsnp <int>, doi <chr>, coverage <chr>,
## # study_design <chr>, priority <int>, sd <dbl>
To extract instruments for a particular trait using a particular study, for example to obtain SNPs for body mass index using the Locke et al 2015 GIANT study, you specify the study ID as follows:
This returns a set of LD clumped SNPs that are GWAS significant for BMI
bmi2014_exp_dat <- extract_instruments(outcomes = 'ieu-a-2')
We now need to find a suitable GWAS for coronary heart disease. We can search the available studies:
ao[grepl("heart disease", ao$trait), ]
## # A tibble: 31 × 25
## id trait ncase group_name year author consortium sex pmid population
## <chr> <chr> <int> <chr> <int> <chr> <chr> <chr> <int> <chr>
## 1 finn… Majo… 21012 public 2021 NA NA Male… NA European
## 2 ukb-… Diag… 1195 public 2018 Ben E… MRC-IEU Male… NA European
## 3 ukb-… I25 … 302 public 2020 Pan-U… NA Male… NA African A…
## 4 ukb-… Diag… 9330 public 2018 Ben E… MRC-IEU Male… NA European
## 5 ieu-… Coro… 60801 public 2015 Nikpay CARDIoGRA… Male… 2.63e7 Mixed
## 6 finn… Seco… 428 public 2021 NA NA Male… NA European
## 7 ukb-… Diag… 5771 public 2018 Ben E… MRC-IEU Male… NA European
## 8 ukb-… Diag… 8755 public 2017 Neale Neale Lab Male… NA European
## 9 ukb-… I25 … 1205 public 2020 Pan-U… NA Male… NA South Asi…
## 10 finn… Othe… 62081 public 2021 NA NA Male… NA European
## # ℹ 21 more rows
## # ℹ 15 more variables: unit <chr>, sample_size <int>, build <chr>,
## # ncontrol <int>, category <chr>, subcategory <chr>, ontology <chr>,
## # note <chr>, mr <int>, nsnp <int>, doi <chr>, coverage <chr>,
## # study_design <chr>, priority <int>, sd <dbl>
The most recent CARDIOGRAM GWAS is ID number ieu-a-7. We can extract the BMI SNPs from this GWAS as follows:
chd_out_dat <- extract_outcome_data(
snps = bmi_exp_dat$SNP,
outcomes = 'ieu-a-7'
)
## Extracting data for 27 SNP(s) from 1 GWAS(s)
dat <- harmonise_data(
exposure_dat = bmi_exp_dat,
outcome_dat = chd_out_dat
)
## Harmonising exposure (hDWFkq) and Coronary heart disease || id:ieu-a-7 (ieu-a-7)
#> Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ieu-a-7 (ieu-a-7)
We use two methods, MR-Egger and inverse variance weighting.
res <- mr(dat, method_list = c("mr_egger_regression", "mr_ivw"))
## Analysing 'hDWFkq' on 'ieu-a-7'
res
## id.exposure id.outcome outcome exposure
## 1 hDWFkq ieu-a-7 Coronary heart disease || id:ieu-a-7 exposure
## 2 hDWFkq ieu-a-7 Coronary heart disease || id:ieu-a-7 exposure
## method nsnp b se pval
## 1 MR Egger 27 0.1138566 0.03292752 1.962323e-03
## 2 Inverse variance weighted 27 0.1139168 0.01611170 1.544411e-12
How does it work?
Inverse variance weighting. We use the same notation as we used in the lecture (Z: instrumental variable, A exposure, Y outcome), you can get the estimate from weighted linear regression of \(\hat{\beta}_{YZ}\) against \(\hat{\beta}_{AZ}\), using 1/se(\(\hat{\beta}_{YZ}^2\)) as weights. the slope is the IVW estimate.
ivw <- lm(dat$beta.outcome ~ dat$beta.exposure + 0, weights = 1/dat$se.outcome^2)
summary(ivw)$coeff
## Estimate Std. Error t value Pr(>|t|)
## dat$beta.exposure 0.1139168 0.0161117 7.070442 1.652727e-07
Egger regression is the same as above, except that it allows an intercept.
er <- lm(dat$beta.outcome ~ dat$beta.exposure, weights = 1/dat$se.outcome^2)
summary(er)$coeff
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.110681e-05 0.005263163 0.002110293 0.998332981
## dat$beta.exposure 1.138566e-01 0.032927518 3.457795399 0.001962323
mr_heterogeneity(dat)
## id.exposure id.outcome outcome exposure
## 1 hDWFkq ieu-a-7 Coronary heart disease || id:ieu-a-7 exposure
## 2 hDWFkq ieu-a-7 Coronary heart disease || id:ieu-a-7 exposure
## method Q Q_df Q_pval
## 1 MR Egger 39.68702 25 0.03139634
## 2 Inverse variance weighted 39.68703 26 0.04185786
mr_pleiotropy_test(dat)
## id.exposure id.outcome outcome exposure
## 1 hDWFkq ieu-a-7 Coronary heart disease || id:ieu-a-7 exposure
## egger_intercept se pval
## 1 1.110681e-05 0.005263163 0.998333
p1 <- mr_scatter_plot(res, dat)
p1
## $`hDWFkq.ieu-a-7`
##
## attr(,"split_type")
## [1] "data.frame"
## attr(,"split_labels")
## id.exposure id.outcome
## 1 hDWFkq ieu-a-7