Last updated: 2019-07-29

Checks: 6 1

Knit directory: cause/gwas_data/

This reproducible R Markdown analysis was created with workflowr (version 1.4.0.9000). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

The global environment had objects present when the code in the R Markdown file was run. These objects can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment. Use wflow_publish or wflow_build to ensure that the code is always run in an empty environment.

The following objects were defined in the global environment when these results were created:

Name Class Size
data environment 56 bytes
env environment 56 bytes

The command set.seed(20181014) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    sim_results/
    Ignored:    src/RcppExports.o
    Ignored:    src/cause.so
    Ignored:    src/log_likelihood_functions.o

Untracked files:
    Untracked:  analysis/figure/ldl_cad.Rmd/
    Untracked:  analysis/figure/little_test.Rmd/
    Untracked:  cause.Rcheck/
    Untracked:  docs/cause.bib
    Untracked:  docs/figure/cause_figure_1_standalone.pdf
    Untracked:  gwas_data/
    Untracked:  ll_v7_notes.Rmd
    Untracked:  tests/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd ded1693 Jean Morrison 2019-07-30 wflow_publish(“analysis/gwas_pairs.Rmd”)
Rmd ff51b98 Jean Morrison 2019-07-29 Working on GWAS pairs documenation
Rmd e8c10f6 Jean Morrison 2019-07-26 Start writing instructions
html 3b42836 Jean Morrison 2019-07-10 Build site.
Rmd e9682b2 Jean Morrison 2019-07-10 wflow_publish(“analysis/gwas_pairs.Rmd”)
html 286f4e9 Jean Morrison 2019-06-25 Build site.
Rmd 8f3b82e Jean Morrison 2019-06-25 wflow_publish(files = c(“analysis/about.Rmd”, “analysis/index.Rmd”, “analysis/ldl_cad.Rmd”, “analysis/license.Rmd”,

Introduction

In this page we walk through an analysis of pairs of 16 traits with publicly available GWAS data using CAUSE. These results are also presented in the paper (Section 2.3) so this page will serve two functions. The first is that it allows an interested person to replicate those results. The second is that it is an example of how to set up a larger scale anlaysis using CAUSE and explains some of the technical differences from an analysis of a single pair of traits. In the paper, we present results for pairs of 20 traits. However, blood pressure results from Eheret et al (2011) (PMID: 21909115) must be obtained through dbGAP and so aren’t included here.

Set Up

We are using Snakemake and a conda environment to run this analysis. If you don’t have Miniconda or Anaconda installed you can either install one of them (recomended) or just make sure that you have Python3, pandas and Snakemake installed. We have chosen to not include R in the conda environment so you will need to have R installed outside the environment and also have the cause and tidyverse R packages installed and running. If you were able to work through the package tutorial you should be in good shape.

Snakemake is a tool for creating scalable workflows. In our case we are using it to perform the same set of data processing and analysis steps for all pairs of traits. We will walk through exactly what the Snakemake analysis is doing in the next sections.

The first set up step is to create the conda environment

conda create -n cause_large python=3.6 snakemake

Next create a working directory that you would like to analyze the data in. Change to that directory using cd in Mac or Linux. For example

mkdir gwas_pairs
cd gwas_pairs

The last step is to download the data and code that we will use. Inside of R, run

cause::setup_gwas_pairs()

This function sets up the analysis directory and is a shortcut to downloading all the data and code we will use in this analysis. Downloading the data might take up to half an hour depending on your network speed. Here is what the function downloads.

  1. GWAS summary statistics, cleaned and formatted. We will talk about these more later on. These files go in the data/ subdirectory.

  2. LD data estimated from 1000 Genomes CEU populatoin. This goes in the ld/ subdirectory.

  3. Some R scripts that will be used to run the analysis. These are in the R/ subdirectory.

  4. A Snakemake file called pairs_snakemake.py.

In the rest of this document we will go through the Snakemake file in detail and explore the scripts that it calls. The analysis is designed to be run on a cluster and can be executed with a single command. First activate the conda environment.

source activate cause_large

The Snakemake command below assumes you are using a cluster with a Slurm workload manager. If your cluster uses something else you should edit the value that is given to the --cluster argument. You may need to edit this anyway to include necessary information. For example, if you are working on the University of Chicago Research Computing Center you will need to add --account and --partition arguments.

nohup snakemake -s pairs_snakemake.py --keep-going --jobs 96 --cluster "sbatch --output={params.log}_%A.out --error={params.log}_%A.err --cpus-per-task={params.cpus} --ntasks=1 --mem-per-cpu={params.mem} --time=1:00:00 --job-name={params.jobname}" > pairs.out &

Data format

Currently there is no standard format for GWAS summary statistics so when you are downloading summary statistics from different sources they are likely to have different formats and may not have effect alleles oriented the same way. Many analysis steps are faster and simpler if all the data files are in the same format and have effect alleles oriented consistently.

The data files downloaded by setup_gwas_pairs() have been pre-formatted so they are ready to use. After running setup_gwas_pairs() you will find a file data/gwas_info.csv that gives information about each study and the original download source. Open up the file and take a look.

library(tidyverse)
library(cause)
info <- read_csv("data/gwas_info.csv")
Parsed with column specification:
cols(
  consortium = col_character(),
  trait = col_character(),
  full_trait_name = col_character(),
  PMID = col_double(),
  pub_sample_size = col_double(),
  pub_cases = col_double(),
  pub_controls = col_double(),
  first_author = col_character(),
  pub_year = col_double(),
  summary_statistics_link = col_character(),
  snp = col_character(),
  a1 = col_character(),
  a2 = col_character(),
  beta_hat = col_character(),
  se = col_character(),
  p_value = col_character(),
  sample_size = col_character(),
  regression_type = col_character()
)
head(info) %>% print(width = Inf)
# A tibble: 6 x 18
  consortium trait  full_trait_name       PMID pub_sample_size pub_cases
  <chr>      <chr>  <chr>                <dbl>           <dbl>     <dbl>
1 glg        tg     triglycerides     24097068          188577        NA
2 glg        ldl    ldl               24097068          188577        NA
3 glg        hdl    hdl               24097068          188577        NA
4 glg        tc     total_cholesterol 24097068          188577        NA
5 giant      height height            25282103          253288        NA
6 giant      bmi    body_mass_index   25673413          339224        NA
  pub_controls first_author pub_year
         <dbl> <chr>           <dbl>
1           NA Willer           2013
2           NA Willer           2013
3           NA Willer           2013
4           NA Willer           2013
5           NA Wood             2014
6           NA Locke            2015
  summary_statistics_link                                                  
  <chr>                                                                    
1 http://csg.sph.umich.edu/abecasis/public/lipids2013/jointGwasMc_TG.txt.gz
2 http://csg.sph.umich.edu/abecasis/public/lipids2013/jointGwasMc_LDL.txt.…
3 http://csg.sph.umich.edu/abecasis/public/lipids2013/jointGwasMc_HDL.txt.…
4 http://csg.sph.umich.edu/abecasis/public/lipids2013/jointGwasMc_TC.txt.gz
5 http://portals.broadinstitute.org/collaboration/giant/images/0/01/GIANT_…
6 http://portals.broadinstitute.org/collaboration/giant/images/1/15/SNP_gw…
  snp        a1      a2      beta_hat se    p_value sample_size
  <chr>      <chr>   <chr>   <chr>    <chr> <chr>   <chr>      
1 rsid       A1      A2      beta     se    P.value N          
2 rsid       A1      A2      beta     se    P-value N          
3 rsid       A1      A2      beta     se    P-value N          
4 rsid       A1      A2      beta     se    P-value N          
5 MarkerName Allele1 Allele2 b        SE    P.value N          
6 SNP        A1      A2      b        se    P.value N          
  regression_type
  <chr>          
1 linear         
2 linear         
3 linear         
4 linear         
5 linear         
6 linear         

For each study we have recorded a short string to indicate the consortium or data set used (eg. glg = Global Lipid Genetics Consortium), a short string to inidicate the trait, the full trait name, information about the source publication (PMID, first author, and publication year), the total sample size from the publication and number of cases and controls if relevant and the original download link. The remaining columns give the column name in the original data of important fields. We don’t need this information for our analysis here. All of the formatted data files that have been downloaded are named data/{consortium}_{trait}_summary_statistics.tsv.gz. To convert the data from their original formats to a standard format, we used a summary statistics processing tool created by Joseph Marcus. This tool is a little more complicated than is required for CAUSE and uses a large external data reference. We hope to include a simpler data formatting function in the cause R package soon. Lets take a look at some of the formatted data. Use the following R commands

dat <- read_tsv("data/ckdgen_egfrcrea_summary_statistics.tsv.gz", 
                col_type=list(col_character(),col_integer(),
                                 col_character(),col_character(),
                                 col_character(),col_double(),col_double(),
                                 col_double(),col_double(),
                                 col_double()))
head(dat) %>% print(width = Inf)
# A tibble: 6 x 10
  chrom    pos snp        ref_allele alt_allele beta_hat     se p_value
  <chr>  <int> <chr>      <chr>      <chr>         <dbl>  <dbl>   <dbl>
1 1     752566 rs3094315  G          A           -0.0013 0.0015   0.38 
2 1     754192 rs3131968  A          G           -0.0013 0.0019   0.5  
3 1     768448 rs12562034 G          A           -0.0014 0.0023   0.55 
4 1     775659 rs2905035  A          G           -0.0022 0.0019   0.24 
5 1     776546 rs12124819 A          G           -0.003  0.0018   0.092
6 1     779322 rs4040617  A          G            0.0019 0.0019   0.33 
  sample_size f_eur
        <dbl> <dbl>
1      114684 0.16 
2      102660 0.128
3      106355 0.908
4      104963 0.128
5      109251 0.771
6       99471 0.882

In our analysis we will exploit the following features of the data format:

  1. All the files have the same information fields in the same order and with the same column names.

  2. All of the alleles are oriented the same way. This means, for example, that if rs3094315 appears in another study, it will also have reference allele G and alternative allele A in that study.

In the future, we will update this section with instructions on how to achieve this using CAUSE functions. If your data are formatted with the same columns we have used (and also saved as a tsv) then you will be able to analyze it using our code making only minimal changes to the paires_snakemake.py file.

Analysis Steps

We will now walk through all the steps in the Snakemake file and explain what they do in detail. If your data is formatted in the same way as ours (see above) you should be able to use all of this code and will only need to change the preamble portion so that your file names match. It will be helpful to familiarize yourself with Snakemake syntax a little before you begin.

Preamble

In the preamble section we set up the analysis we want to do. First we list some directories.

import pandas as pd

data_dir = "data/" #where the data is
ld_dir = "ld/" #where the ld data is
cause_dir = "cause/" #where CAUSE results will go
mr_dir = "mr/" #where other MR method results will go

Next we create a table of all the trait pairs we’d like to analyze.

consortia = ["giant", "giant", "lu",
            "glg", "glg", "glg", "glg", 
            "ckdgen", "gefos",
            "egg", "egg", "egg",
            "vanderHarst", "diagram",
            "megastroke", "magic"]
            
traits = ["height", "bmi", "bfp",
          "tg", "ldl", "hdl", "tc",
          "egfrcrea",   "bone",
          "bl", "bw", "hc",
          "cad", "t2d",
          "as", "fg"]
          
tags = [consortia[i] + "_" + traits[i] for i in range(len(traits))]

tag_pairs = [(tag1, tag2) for tag1 in tags for tag2 in tags if tag1!=tag2]

We will refer to the string {consortium}_{trait} as a “tag” in the rest of this analysis. Finally, in Snakemake, the rule all lists all the files we should have at the end of the analysis. The rest of the file will explain how to produce these.

rule all:
    input: expand(mr_dir + '{tp[0]}__{tp[1]}_mr.RDS', tp = tag_pairs),
           expand(mr_dir + '{tp[0]}__{tp[1]}_mrpresso.RDS', tp = tag_pairs),
           expand(mr_dir + '{tp[0]}__{tp[1]}_mregger.RDS', tp = tag_pairs),
           expand(cause_dir + '{tp[0]}__{tp[1]}_cause.RDS', tp = tag_pairs)

Data overlap with Awk

We use Awk to write out temporary files that will speed up some of the later analysis steps

rule data_overlap:
    input: file1 = data_dir + '{tag1}_summary_statistics.tsv.gz',
           file2 = data_dir + '{tag2}_summary_statistics.tsv.gz'
    output: out= temp(data_dir + "{tag1}__{tag2}_overlap.tsv.gz")
    params: log="tempov", mem="20G", cpus="1",
            jobname='dataov'
    shell: """
           snps() {{ gzip -cd "$@" | awk '{{ if ($7!="NA" && $7 > 0 && $6!="NA") print $3 }}' ;}}
           full() {{ gzip -cd "$@" | awk '{{ if ($7!="NA" && $7 > 0 && $6!="NA") print $0 }}' ;}}
           snps {input.file2} | awk 'NR==FNR{{F1[$0];next}}$3 in F1{{print}}' - <(full {input.file1}) | gzip > {output.out}
           """

This step takes as input two data files, one for tag1 (trait \(M\)) and one for tag2 (trait \(Y\)). It outputs a file that is simply the subset of the tag1 data for SNPs that are in both tag1 and tag2 GWAS. The temp() in the output: line tells Snakemake to delete these files when we are done with them. This step also filters out SNPs that have missing effect estimates or standard errors or who’s standard errors are non-positive.

LD Pruning

CAUSE estimates posteriors using a set of LD pruned variants. To maximize power, we LD prune preferentially choosing variants with low tarit \(M\) \(p\)-values. The next two rules in the Snakemake file write a list of LD pruned variants for each trait pair.

First there is a rule that LD prunes a single chromosome:

rule ld_prune_one_chrom:
    input: data = data_dir + '{tag1}__{tag2}_overlap.tsv.gz',
           ld1 = ld_dir + 'chr{chrom}_AF0.05_0.1.RDS',
           ld2 = ld_dir + 'chr{chrom}_AF0.05_snpdata.RDS'
    output: out=temp(data_dir + "snps_{tag1}__{tag2}.pruned.{chrom}.RDS")
    params: log="ldprune", mem="10G", cpus="4",
            pval_thresh = "1e-3", r2_thresh = 0.1 ,
            jobname='ldprune_{chrom}', partition="broadwl"
    shell:   'Rscript R/ld_prune_one_chrom.R {input.data} {wildcards.chrom}  \
                   {params.pval_thresh} {params.r2_thresh} {input.ld1} {input.ld2} {output.out}'
                                 

This rule takes as input the overlap data set for trait \(M\) that we created previously and LD data. The output is a pruned list of SNPs on a given chromosome. The rule calls an R scirpt R/ld_prune_one_chrom.R that reads in the data, removes duplicated SNPs and then LD prunes using the function cause::ld_prune.

The next rule concatonates pruned lists for all chromosomes for a single pair into one file.

rule ld_prune_combine:
    input: fls = expand( data_dir + "snps_{{tag1}}__{{tag2}}.pruned.{chr}.RDS", chr = range(1, 23))
    output: out1 = data_dir + "snps_{tag1}__{tag2}.pruned.txt"
    params: log="ld_comb", mem="2G", cpus="1",
            jobname='combine', partition="broadwl"
    shell: "Rscript R/ld_cat.R {output.out1} {input.fls}"

CAUSE

The next step is to run CAUSE

rule cause:
    input: file1 = data_dir + "{tag1}__{tag2}_overlap.tsv.gz",
           file2 = data_dir + '{tag2}__{tag1}_overlap.tsv.gz',
           snps = data_dir + 'snps_{tag1}__{tag2}.pruned.txt'
    output: params = cause_dir + '{tag1}__{tag2}_params.RDS',
            cause = cause_dir + '{tag1}__{tag2}_cause.RDS',
            data = data_dir + '{tag1}__{tag2}_data.RDS'
    params: log="cause", mem="5G", cpus="8",
            jobname='cause', seed = 100
    shell: 'Rscript R/cause.R {input.file1} {input.file2}  \
                   {input.snps} {output.params} \
                   {output.cause} {output.data} {params.seed}'

This rule takes as input the overlap tag1 and tag2 data sets and the list of pruned snps. It uses the R script R/cause.R which we will go through below. We could have used the *_summary_statistics.tsv.gz files in place of the *_overlap.tsv.gz files. Using the overlap files saves reading some data into memory and can be especially helpful when one GWAS has many more variants measured than the other.The R script performs four steps:

  1. Read in the data and filter out duplicated SNPs.
args <- commandArgs(trailingOnly=TRUE)
#Input files
data_file_1 <- args[1]
data_file_2 <- args[2]
snp_file_asc <- args[3]
#Output files
params_out <- args[4]
cause_out <- args[5]
data_out <- args[6]
seed <- as.numeric(args[7])

#if(is.na(seed)) seed <- 100


d1 <- read_tsv(data_file_1, col_type=list(col_character(),col_integer(),
                                 col_character(),col_character(),
                                 col_character(),col_double(),col_double(),
                                 col_double(),col_double(),
                                 col_double()))

dup1 <- d1$snp[duplicated(d1$snp)]
d1 <- d1 %>% filter(!snp %in% dup1)

d2 <- read_tsv(data_file_2, col_type=list(col_character(),col_integer(),
                                 col_character(),col_character(),
                                 col_character(),col_double(),col_double(),
                                 col_double(),col_double(),
                                 col_double()))

dup2 <- d1$snp[duplicated(d1$snp)]
d2 <- d2 %>% filter(!snp %in% dup2)
  1. Merge the data. Normally we would do this using cause::gwas_format_cause. However, this function performs steps to ensure that alleles are oriented in the same way that are unnecessary because the data has already been formated uniformly. If we skip these steps, the merging procedure is about eight times faster. We save the mreged data to use with other MR methods.
X <- d1 %>%
     select(snp, beta_hat, se) %>%
     rename(beta_hat_1 = beta_hat, seb1 = se) %>%
     inner_join(., d2, by="snp") %>%
     rename(beta_hat_2 = beta_hat, seb2 = se,
           A1 = ref_allele, A2 = alt_allele) %>%
     select(snp, beta_hat_1, seb1, beta_hat_2, seb2, A1, A2) %>%
     new_cause_data(.)

saveRDS(X, file=data_out)
  1. Estimate CAUSE nuisance parameters using a random set of 1,000,000 variants.
set.seed(seed)
if(nrow(X) < 1e6){
    snps_grid <- X$snp
}else{
    snps_grid <- sample(X$snp, size=1e6, replace=FALSE)
}

params <- est_cause_params(X, snps_grid)
saveRDS(params, params_out)
  1. Run CAUSE
snps_asc <- read_lines(snp_file_asc)
res <- cause(X =X, param_ests = params, variants=snps_asc,
             qalpha = 1, qbeta = 10, force=TRUE)
saveRDS(res, cause_out)

The force = TRUE argument ensures that CAUSE runs even if the parameter estimates didn’t converge. In our analysis this never happens.

Run other MR methods

The remaining rules in the Snakemake file run alternative MR methods: IVW regression, Egger regression and MR-PRESSO. These rules all have the same format, each calling its own R script

rule ivw:
    input: data = data_dir + '{tag1}__{tag2}_data.RDS'
    output: out = mr_dir + '{tag1}__{tag2}_mr.RDS',
    params: log="mr", mem="1G", cpus="1",
            jobname='mr_{tag1}__{tag2}'
    shell: 'Rscript R/ivw.R {input.data} 5e-8 {output.out} '

rule egger:
    input: data = data_dir + '{tag1}__{tag2}_data.RDS'
    output: out = mr_dir + '{tag1}__{tag2}_mregger.RDS',
    params: log="mr", mem="1G", cpus="1",
            jobname='mr_{tag1}__{tag2}'
    shell: 'Rscript R/mregger.R {input.data} 5e-8 {output.out} '

rule mrpresso:
    input: data = data_dir + '{tag1}__{tag2}_data.RDS'
    output: out = mr_dir + '{tag1}__{tag2}_mrpresso.RDS',
    params: log="mrp", mem="1G", cpus="1",
            jobname='mrp_{tag1}__{tag2}'
    shell: 'Rscript R/mrpresso.R {input.data} 5e-8 {output.out} '

Plotting and looking at results

Coming Soon!


sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cause_0.3.0.0178 forcats_0.4.0    stringr_1.4.0    dplyr_0.8.3     
 [5] purrr_0.3.2      readr_1.3.1      tidyr_0.8.3      tibble_2.1.3    
 [9] ggplot2_3.2.0    tidyverse_1.2.1 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1           lubridate_1.7.4      lattice_0.20-38     
 [4] utf8_1.1.4           assertthat_0.2.1     zeallot_0.1.0       
 [7] rprojroot_1.3-2      digest_0.6.20        foreach_1.4.4       
[10] truncnorm_1.0-8      R6_2.4.0             cellranger_1.1.0    
[13] backports_1.1.4      evaluate_0.14        httr_1.4.0          
[16] highr_0.8            pillar_1.4.2         rlang_0.4.0         
[19] lazyeval_0.2.2       pscl_1.5.2           readxl_1.3.1        
[22] rstudioapi_0.10      whisker_0.3-2        Matrix_1.2-17       
[25] rmarkdown_1.13       loo_2.1.0            munsell_0.5.0       
[28] mixsqp_0.1-97        broom_0.5.2          numDeriv_2016.8-1.1 
[31] compiler_3.6.1       modelr_0.1.4         xfun_0.8            
[34] pkgconfig_2.0.2      SQUAREM_2017.10-1    htmltools_0.3.6     
[37] tidyselect_0.2.5     gridExtra_2.3        workflowr_1.4.0.9000
[40] matrixStats_0.54.0   intervals_0.15.1     codetools_0.2-16    
[43] fansi_0.4.0          crayon_1.3.4         withr_2.1.2         
[46] MASS_7.3-51.4        grid_3.6.1           nlme_3.1-140        
[49] jsonlite_1.6         gtable_0.3.0         git2r_0.26.1        
[52] magrittr_1.5         scales_1.0.0         RcppParallel_4.4.3  
[55] cli_1.1.0            stringi_1.4.3        fs_1.3.1            
[58] doParallel_1.0.14    xml2_1.2.0           vctrs_0.2.0         
[61] generics_0.0.2       iterators_1.0.10     tools_3.6.1         
[64] glue_1.3.1           hms_0.5.0            parallel_3.6.1      
[67] yaml_2.2.0           colorspace_1.4-1     ashr_2.2-32         
[70] rvest_0.3.4          knitr_1.23           haven_2.1.1