Perform MR

We will follow the TwoSampleMR tutorial to perform a basic MR analysis.

In this example, we will perform MR of BMI against coronary heart disease. To do so we need to identify the SNPs that influence the BMI (get instruments), and then extract those SNPs from a GWAS on coronary heart disease (effects of instruments on outcome), then perform MR.

library(TwoSampleMR)

## TwoSampleMR version 0.5.9 
## [>] New: Option to use non-European LD reference panels for clumping etc
## [>] Some studies temporarily quarantined to verify effect allele
## [>] See news(package='TwoSampleMR') and https://gwas.mrcieu.ac.uk for further details

Get instruments for exposures

First read in data for exposure:

bmi2_file <- system.file("extdata/bmi.csv", package = "TwoSampleMR")
bmi_exp_dat <- read_exposure_data(
    filename = bmi2_file,
    sep = ",",
    snp_col = "rsid",
    beta_col = "effect",
    se_col = "SE",
    effect_allele_col = "a1",
    other_allele_col = "a2",
    eaf_col = "a1_freq",
    pval_col = "p-value",
    units_col = "Units",
    gene_col = "Gene",
    samplesize_col = "n"
)

## No phenotype name specified, defaulting to 'exposure'.

#> No phenotype name specified, defaulting to 'exposure'.
head(bmi_exp_dat)

##          SNP beta.exposure se.exposure effect_allele.exposure
## 1 rs10767664          0.19  0.03061224                      A
## 2 rs13078807          0.10  0.02040816                      G
## 3  rs1514175          0.07  0.02040816                      A
## 4  rs1558902          0.39  0.02040816                      A
## 5 rs10968576          0.11  0.02040816                      G
## 6  rs2241423          0.13  0.02040816                      G
##   other_allele.exposure eaf.exposure pval.exposure units.exposure gene.exposure
## 1                     T         0.78         5e-26          kg/m2          BDNF
## 2                     A         0.20         4e-11          kg/m2         CADM2
## 3                     G         0.43         8e-14          kg/m2        TNNI3K
## 4                     T         0.42        5e-120          kg/m2           FTO
## 5                     A         0.31         3e-13          kg/m2        LRRN6C
## 6                     A         0.78         1e-18          kg/m2       LBXCOR1
##   samplesize.exposure exposure mr_keep.exposure pval_origin.exposure
## 1              225238 exposure             TRUE             reported
## 2              221431 exposure             TRUE             reported
## 3              207641 exposure             TRUE             reported
## 4              222476 exposure             TRUE             reported
## 5              247166 exposure             TRUE             reported
## 6              227886 exposure             TRUE             reported
##   units.exposure_dat id.exposure data_source.exposure
## 1              kg/m2      hDWFkq             textfile
## 2              kg/m2      hDWFkq             textfile
## 3              kg/m2      hDWFkq             textfile
## 4              kg/m2      hDWFkq             textfile
## 5              kg/m2      hDWFkq             textfile
## 6              kg/m2      hDWFkq             textfile

To serve as valid instruments, they need to be independent. Thus we need to do clumping (those SNPs that have LD R-square above the specified threshold only the SNP with the lowest P-value will be retained).

bmi_exp_dat <- clump_data(bmi_exp_dat)

## API: public: http://gwas-api.mrcieu.ac.uk/

## Please look at vignettes for options on running this locally if you need to run many instances of this command.

## Clumping hDWFkq, 30 variants, using EUR population reference

## Removing 3 of 30 variants due to LD with other variants or absence from LD reference panel

This package can also obtain data from thousands of existing GWAS directly. A list of available GWAS

ao <- available_outcomes()
head(ao)

## # A tibble: 6 × 25
##   id      trait ncase group_name  year author consortium sex     pmid population
##   <chr>   <chr> <int> <chr>      <int> <chr>  <chr>      <chr>  <int> <chr>     
## 1 ieu-b-… Schi…  1234 public      2022 Trube… PGC        Male… 3.54e7 Hispanic …
## 2 ieu-b-… Schi… 52017 public      2022 Trube… PGC        Male… 3.54e7 European  
## 3 ieu-b-… Schi… 12305 public      2022 Trube… PGC        Male… 3.54e7 East Asian
## 4 ieu-b-… Schi… 64322 public      2022 Trube… PGC        Male… 3.54e7 Mixed     
## 5 ieu-b-… Schi… 76755 public      2022 Trube… PGC        Male… 3.54e7 Mixed     
## 6 ieu-b-… Schi…  5998 public      2022 Trube… PGC        Male… 3.54e7 African A…
## # ℹ 15 more variables: unit <chr>, sample_size <int>, build <chr>,
## #   ncontrol <int>, category <chr>, subcategory <chr>, ontology <chr>,
## #   note <chr>, mr <int>, nsnp <int>, doi <chr>, coverage <chr>,
## #   study_design <chr>, priority <int>, sd <dbl>

To extract instruments for a particular trait using a particular study, for example to obtain SNPs for body mass index using the Locke et al 2015 GIANT study, you specify the study ID as follows:

This returns a set of LD clumped SNPs that are GWAS significant for BMI

bmi2014_exp_dat <- extract_instruments(outcomes = 'ieu-a-2')

Get effects of intruments on outcomes

We now need to find a suitable GWAS for coronary heart disease. We can search the available studies:

ao[grepl("heart disease", ao$trait), ]

## # A tibble: 31 × 25
##    id    trait ncase group_name  year author consortium sex      pmid population
##    <chr> <chr> <int> <chr>      <int> <chr>  <chr>      <chr>   <int> <chr>     
##  1 finn… Majo… 21012 public      2021 NA     NA         Male… NA      European  
##  2 ukb-… Diag…  1195 public      2018 Ben E… MRC-IEU    Male… NA      European  
##  3 ukb-… I25 …   302 public      2020 Pan-U… NA         Male… NA      African A…
##  4 ukb-… Diag…  9330 public      2018 Ben E… MRC-IEU    Male… NA      European  
##  5 ieu-… Coro… 60801 public      2015 Nikpay CARDIoGRA… Male…  2.63e7 Mixed     
##  6 finn… Seco…   428 public      2021 NA     NA         Male… NA      European  
##  7 ukb-… Diag…  5771 public      2018 Ben E… MRC-IEU    Male… NA      European  
##  8 ukb-… Diag…  8755 public      2017 Neale  Neale Lab  Male… NA      European  
##  9 ukb-… I25 …  1205 public      2020 Pan-U… NA         Male… NA      South Asi…
## 10 finn… Othe… 62081 public      2021 NA     NA         Male… NA      European  
## # ℹ 21 more rows
## # ℹ 15 more variables: unit <chr>, sample_size <int>, build <chr>,
## #   ncontrol <int>, category <chr>, subcategory <chr>, ontology <chr>,
## #   note <chr>, mr <int>, nsnp <int>, doi <chr>, coverage <chr>,
## #   study_design <chr>, priority <int>, sd <dbl>

The most recent CARDIOGRAM GWAS is ID number ieu-a-7. We can extract the BMI SNPs from this GWAS as follows:

chd_out_dat <- extract_outcome_data(
    snps = bmi_exp_dat$SNP,
    outcomes = 'ieu-a-7'
)

## Extracting data for 27 SNP(s) from 1 GWAS(s)

Harmonise the exposure and outcome data

dat <- harmonise_data(
    exposure_dat = bmi_exp_dat, 
    outcome_dat = chd_out_dat
)

## Harmonising exposure (hDWFkq) and Coronary heart disease || id:ieu-a-7 (ieu-a-7)

#> Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ieu-a-7 (ieu-a-7)

Perform MR

We use two methods, MR-Egger and inverse variance weighting.

res <- mr(dat, method_list = c("mr_egger_regression", "mr_ivw"))

## Analysing 'hDWFkq' on 'ieu-a-7'

res

##   id.exposure id.outcome                              outcome exposure
## 1      hDWFkq    ieu-a-7 Coronary heart disease || id:ieu-a-7 exposure
## 2      hDWFkq    ieu-a-7 Coronary heart disease || id:ieu-a-7 exposure
##                      method nsnp         b         se         pval
## 1                  MR Egger   27 0.1138566 0.03292752 1.962323e-03
## 2 Inverse variance weighted   27 0.1139168 0.01611170 1.544411e-12

How does it work?

Inverse variance weighting. We use the same notation as we used in the lecture (Z: instrumental variable, A exposure, Y outcome), you can get the estimate from weighted linear regression of \(\hat{\beta}_{YZ}\) against \(\hat{\beta}_{AZ}\), using 1/se(\(\hat{\beta}_{YZ}^2\)) as weights. the slope is the IVW estimate.

ivw <- lm(dat$beta.outcome ~ dat$beta.exposure + 0, weights = 1/dat$se.outcome^2)
summary(ivw)$coeff

##                    Estimate Std. Error  t value     Pr(>|t|)
## dat$beta.exposure 0.1139168  0.0161117 7.070442 1.652727e-07

Egger regression is the same as above, except that it allows an intercept.

er <- lm(dat$beta.outcome ~ dat$beta.exposure, weights = 1/dat$se.outcome^2)
summary(er)$coeff

##                       Estimate  Std. Error     t value    Pr(>|t|)
## (Intercept)       1.110681e-05 0.005263163 0.002110293 0.998332981
## dat$beta.exposure 1.138566e-01 0.032927518 3.457795399 0.001962323

Integrating eQTL and GWAS: TWAS and coloc

Siming Zhao

2/3/2024

Before the class

Perform MR

Get instruments for exposures

Get effects of intruments on outcomes

Harmonise the exposure and outcome data

Perform MR

Analyze MR results