Last updated: 2019-06-25

Checks: 7 0

Knit directory: misc/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.4.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(12345)

The command set.seed(12345) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 9bc3245

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.RData
    Ignored:    analysis/.Rhistory
    Ignored:    analysis/ALStruct_cache/
    Ignored:    data/.Rhistory
    Ignored:    data/pbmc/
    Ignored:    docs/figure/.DS_Store

Untracked files:
    Untracked:  .dropbox
    Untracked:  Icon
    Untracked:  _workflowr.yml
    Untracked:  analysis/GTEX-cogaps.Rmd
    Untracked:  analysis/SPCAvRP.rmd
    Untracked:  analysis/compare-transformed-models.Rmd
    Untracked:  analysis/cormotif.Rmd
    Untracked:  analysis/eQTL.perm.rand.pdf
    Untracked:  analysis/flash_test_tree.Rmd
    Untracked:  analysis/ieQTL.perm.rand.pdf
    Untracked:  analysis/m6amash.Rmd
    Untracked:  analysis/mash_bhat_z.Rmd
    Untracked:  analysis/mash_ieqtl_permutations.Rmd
    Untracked:  analysis/mixsqp.Rmd
    Untracked:  analysis/normalize.Rmd
    Untracked:  analysis/pbmc.Rmd
    Untracked:  analysis/poisson_transform.Rmd
    Untracked:  analysis/pseudodata.Rmd
    Untracked:  analysis/sc_bimodal.Rmd
    Untracked:  analysis/susie_en.Rmd
    Untracked:  analysis/susie_z_investigate.Rmd
    Untracked:  analysis/svd-timing.Rmd
    Untracked:  analysis/test-figure/
    Untracked:  analysis/test.Rpres
    Untracked:  analysis/test.md
    Untracked:  analysis/test_sparse.Rmd
    Untracked:  analysis/z.txt
    Untracked:  code/multivariate_testfuncs.R
    Untracked:  data/4matthew/
    Untracked:  data/4matthew2/
    Untracked:  data/E-MTAB-2805.processed.1/
    Untracked:  data/ENSG00000156738.Sim_Y2.RDS
    Untracked:  data/GDS5363_full.soft.gz
    Untracked:  data/GSE41265_allGenesTPM.txt
    Untracked:  data/Muscle_Skeletal.ACTN3.pm1Mb.RDS
    Untracked:  data/Thyroid.FMO2.pm1Mb.RDS
    Untracked:  data/bmass.HaemgenRBC2016.MAF01.Vs2.MergedDataSources.200kRanSubset.ChrBPMAFMarkerZScores.vs1.txt.gz
    Untracked:  data/bmass.HaemgenRBC2016.Vs2.NewSNPs.ZScores.hclust.vs1.txt
    Untracked:  data/bmass.HaemgenRBC2016.Vs2.PreviousSNPs.ZScores.hclust.vs1.txt
    Untracked:  data/finemap_data/fmo2.sim/b.txt
    Untracked:  data/finemap_data/fmo2.sim/dap_out.txt
    Untracked:  data/finemap_data/fmo2.sim/dap_out2.txt
    Untracked:  data/finemap_data/fmo2.sim/dap_out2_snp.txt
    Untracked:  data/finemap_data/fmo2.sim/dap_out_snp.txt
    Untracked:  data/finemap_data/fmo2.sim/data
    Untracked:  data/finemap_data/fmo2.sim/fmo2.sim.config
    Untracked:  data/finemap_data/fmo2.sim/fmo2.sim.k
    Untracked:  data/finemap_data/fmo2.sim/fmo2.sim.k4.config
    Untracked:  data/finemap_data/fmo2.sim/fmo2.sim.k4.snp
    Untracked:  data/finemap_data/fmo2.sim/fmo2.sim.ld
    Untracked:  data/finemap_data/fmo2.sim/fmo2.sim.snp
    Untracked:  data/finemap_data/fmo2.sim/fmo2.sim.z
    Untracked:  data/finemap_data/fmo2.sim/pos.txt
    Untracked:  data/logm.csv
    Untracked:  data/m.cd.RDS
    Untracked:  data/m.cdu.old.RDS
    Untracked:  data/m.new.cd.RDS
    Untracked:  data/m.old.cd.RDS
    Untracked:  data/mainbib.bib.old
    Untracked:  data/mat.csv
    Untracked:  data/mat.txt
    Untracked:  data/mat_new.csv
    Untracked:  data/matrix_lik.rds
    Untracked:  data/paintor_data/
    Untracked:  data/temp.txt
    Untracked:  data/y.txt
    Untracked:  data/y_f.txt
    Untracked:  data/zscore_jointLCLs_m6AQTLs_susie_eQTLpruned.rds
    Untracked:  data/zscore_jointLCLs_random.rds
    Untracked:  docs/figure/eigen.Rmd/
    Untracked:  docs/figure/fmo2.sim.Rmd/
    Untracked:  docs/figure/newVB.elbo.Rmd/
    Untracked:  docs/figure/poisson_transform.Rmd/
    Untracked:  docs/figure/rbc_zscore_mash2.Rmd/
    Untracked:  docs/figure/rbc_zscore_mash2_analysis.Rmd/
    Untracked:  docs/figure/rbc_zscores.Rmd/
    Untracked:  docs/figure/susie_en.Rmd/
    Untracked:  docs/trend_files/
    Untracked:  docs/z.txt
    Untracked:  explore_udi.R
    Untracked:  output/fit.varbvs.RDS
    Untracked:  output/glmnet.fit.RDS
    Untracked:  output/test.bv.txt
    Untracked:  output/test.gamma.txt
    Untracked:  output/test.hyp.txt
    Untracked:  output/test.log.txt
    Untracked:  output/test.param.txt
    Untracked:  output/test2.bv.txt
    Untracked:  output/test2.gamma.txt
    Untracked:  output/test2.hyp.txt
    Untracked:  output/test2.log.txt
    Untracked:  output/test2.param.txt
    Untracked:  output/test3.bv.txt
    Untracked:  output/test3.gamma.txt
    Untracked:  output/test3.hyp.txt
    Untracked:  output/test3.log.txt
    Untracked:  output/test3.param.txt
    Untracked:  output/test4.bv.txt
    Untracked:  output/test4.gamma.txt
    Untracked:  output/test4.hyp.txt
    Untracked:  output/test4.log.txt
    Untracked:  output/test4.param.txt
    Untracked:  output/test5.bv.txt
    Untracked:  output/test5.gamma.txt
    Untracked:  output/test5.hyp.txt
    Untracked:  output/test5.log.txt
    Untracked:  output/test5.param.txt

Unstaged changes:
    Modified:   analysis/_site.yml
    Deleted:    analysis/chunks.R
    Modified:   analysis/eigen.Rmd
    Modified:   analysis/fmo2.sim.Rmd
    Modified:   analysis/newVB.Rmd
    Modified:   analysis/selective_inference.Rmd
    Modified:   analysis/simple_transform_simulation.Rmd
    Modified:   analysis/wSVD.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File	Version	Author	Date	Message
Rmd	9bc3245	Matthew Stephens	2019-06-25	workflowr::wflow_publish(“analysis/selective_inference_toy.Rmd”)

Introduction

The aim here is to run selective inference on the toy example from Wang et al (2018) [https://www.biorxiv.org/content/10.1101/501114v1]

First load libraries:

library(selectiveInference)

Loading required package: glmnet

Loading required package: Matrix

Loading required package: foreach

Loaded glmnet 2.0-18

Loading required package: intervals


Attaching package: 'intervals'

The following object is masked from 'package:Matrix':

    expand

Loading required package: survival

Now simulate some data with \(x_1=x_2\) and \(x_3=x_4\) and effects at variables 1 and 3. (We do \(p=100\) rather than \(p=1000\) to reduce run-time)

set.seed(15)
n= 500
p = 100
x = matrix(rnorm(n*p),nrow=n,ncol=p)
x[,2] = x[,1] 
x[,4] = x[,3]
b = rep(0,p)
b[1]= 1
b[4] = 1
y = x %*% b + rnorm(n)

Unfortunately the method won’t allow duplicate columns:

# run forward stepwise,
  try(fsfit <- fs(x,y))

Error in checkargs.xy(x = x, y = y) : x cannot have duplicate columns

  try(larfit <- lar(x,y))

Error in checkargs.xy(x = x, y = y) : x cannot have duplicate columns

So we modify x to make it not quite identical

  x[,2] = x[,1] + rnorm(n,0,0.1) 
  x[,4] = x[,3] + rnorm(n,0,0.1)
  cor(x[,2],x[,1])

[1] 0.9955977

  cor(x[,4],x[,3])

[1] 0.9952243

Now run the forward selection, and compute sequential p-values and confidence intervals

  fsfit <- fs(x,y)
  
# compute sequential p-values and confidence intervals
# (sigma estimated from full model)
  out = fsInf(fsfit)
  out


Call:
fsInf(obj = fsfit)

Standard deviation of noise (specified or estimated) sigma = 0.995

Sequential testing results with alpha = 0.100
 Step Var   Coef Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
    1   1  0.900  20.573   0.003     0.441    0.953       0.050      0.049
    2   3  0.956  21.494   0.000     0.716    1.397       0.050      0.050
    3  37  0.099   2.133   0.539    -0.439    0.156       0.050      0.050
    4  45  0.083   1.817   0.911      -Inf    0.204       0.000      0.050
    5   4 -0.838  -1.850   0.215      -Inf   10.125       0.000      0.050
    6  46  0.077   1.747   0.715    -3.695    1.192       0.050      0.050
    7  64 -0.077  -1.718   0.409    -3.906    2.666       0.050      0.050
    8  79  0.076   1.678   0.542    -3.007    2.405       0.050      0.050
    9  43  0.080   1.721   0.366    -1.354    2.373       0.050      0.050
   10  68 -0.075  -1.669   0.456      -Inf      Inf       0.000      0.000
   11  13  0.081   1.738   0.434      -Inf      Inf       0.000      0.000
   12  61 -0.075  -1.588   0.102      -Inf    0.580       0.000      0.050
   13  56 -0.064  -1.452   0.487    -3.159    3.055       0.050      0.050
   14  84  0.058   1.332   0.901      -Inf    0.601       0.000      0.050
   15  90  0.060   1.324   0.548    -3.950    3.120       0.050      0.050
   16  65 -0.057  -1.284   0.298      -Inf      Inf       0.000      0.000
   17  69 -0.056  -1.243   0.071      -Inf    0.613       0.000      0.050
   18  74 -0.053  -1.177   0.998     4.469      Inf       0.010      0.000
   19  18  0.052   1.194   0.084    -3.562      Inf       0.050      0.000
   20  70 -0.047  -1.097   0.229      -Inf      Inf       0.000      0.000
   21  49 -0.049  -1.075   0.092      -Inf    1.785       0.000      0.050
   22  88  0.055   1.118   0.918      -Inf    1.226       0.000      0.050
   23  25  0.049   1.055   0.060    -0.614      Inf       0.050      0.000
   24  27 -0.047  -1.055   0.504      -Inf      Inf       0.000      0.000
   25  96  0.051   1.120   0.861      -Inf      Inf       0.000      0.000
   26  39  0.048   1.029   0.908      -Inf    3.127       0.000      0.050
   27  40 -0.050  -1.041   0.005      -Inf   -4.787       0.000      0.011
   28  17 -0.045  -0.968   0.190      -Inf    3.100       0.000      0.050
   29  66 -0.041  -0.901   0.879    -2.066      Inf       0.050      0.000
   30  85  0.042   0.916   0.874      -Inf      Inf       0.000      0.000
   31   8 -0.044  -0.932   0.391      -Inf      Inf       0.000      0.000
   32  22 -0.044  -0.933   0.144      -Inf    3.515       0.000      0.050
   33  78  0.041   0.933   0.821      -Inf      Inf       0.000      0.000
   34  28 -0.041  -0.882   0.789    -4.262      Inf       0.050      0.000
   35   7  0.043   0.896   0.894      -Inf    2.807       0.000      0.050
   36  26  0.044   0.905   0.620      -Inf      Inf       0.000      0.000
   37  21 -0.041  -0.869   0.590      -Inf      Inf       0.000      0.000
   38  71  0.043   0.940   0.194      -Inf      Inf       0.000      0.000
   39  73  0.040   0.858   0.247      -Inf      Inf       0.000      0.000
   40  50 -0.041  -0.867   0.963     4.773      Inf       0.048      0.000
   41  99  0.041   0.876   0.022     4.638      Inf       0.033      0.000
   42  59  0.036   0.790   0.079    -1.322      Inf       0.050      0.000
   43  76  0.034   0.738   0.060    -0.458      Inf       0.050      0.000
   44  62 -0.037  -0.781   0.760    -3.209      Inf       0.050      0.000
   45  19  0.031   0.717   0.638      -Inf      Inf       0.000      0.000
   46  36 -0.033  -0.711   0.705      -Inf      Inf       0.000      0.000
   47   6  0.030   0.684   0.828      -Inf      Inf       0.000      0.000
   48  34  0.034   0.725   0.041     1.281      Inf       0.050      0.000
   49  57 -0.035  -0.740   0.915    -2.026      Inf       0.050      0.000
   50  42 -0.034  -0.754   0.865    -4.424      Inf       0.050      0.000
   51  14 -0.032  -0.692   0.212      -Inf    3.813       0.000      0.050
   52  53  0.030   0.665   0.301      -Inf      Inf       0.000      0.000
   53  83 -0.028  -0.627   0.704      -Inf      Inf       0.000      0.000
   54  35  0.029   0.620   0.680      -Inf      Inf       0.000      0.000
   55  12  0.027   0.594   0.850      -Inf    2.456       0.000      0.050
   56   2 -0.285  -0.587   0.877      -Inf      Inf       0.000      0.000
   57  38  0.031   0.617   0.632      -Inf      Inf       0.000      0.000
   58  41 -0.026  -0.562   0.987     0.322      Inf       0.000      0.000
   59  51  0.027   0.583   0.023    -0.330      Inf       0.000      0.000
   60  91  0.030   0.644   0.943      -Inf    0.654       0.000      0.050
   61  58  0.027   0.574   0.773      -Inf      Inf       0.000      0.000
   62  97  0.029   0.558   0.002     5.116      Inf       0.004      0.000
   63  87  0.025   0.523   0.552      -Inf      Inf       0.000      0.000
   64   5  0.025   0.509   0.108    -2.311      Inf       0.050      0.000
   65  63  0.022   0.485   0.489      -Inf      Inf       0.000      0.000
   66  47  0.022   0.455   0.927      -Inf    2.634       0.000      0.050
   67   9  0.021   0.459   0.642      -Inf      Inf       0.000      0.000
   68  31  0.021   0.448   0.064    -1.444      Inf       0.050      0.000
   69  89 -0.021  -0.426   0.721      -Inf      Inf       0.000      0.000
   70  32 -0.022  -0.419   0.428      -Inf      Inf       0.000      0.000
   71  86 -0.021  -0.422   0.188      -Inf      Inf       0.000      0.000
   72  48 -0.020  -0.402   0.492      -Inf      Inf       0.000      0.000
   73  20 -0.017  -0.360   0.163      -Inf      Inf       0.000      0.000
   74  67 -0.018  -0.373   0.202      -Inf      Inf       0.000      0.000
   75  44  0.017   0.348   0.281      -Inf      Inf       0.000      0.000
   76  52  0.016   0.329   0.407    -3.741      Inf       0.050      0.000
   77  11  0.015   0.309   0.697      -Inf      Inf       0.000      0.000
   78  23  0.014   0.308   0.357      -Inf      Inf       0.000      0.000
   79  94  0.014   0.313   0.402      -Inf      Inf       0.000      0.000
   80  30  0.014   0.287   0.339      -Inf      Inf       0.000      0.000
   81  95  0.013   0.264   0.269      -Inf      Inf       0.000      0.000
   82  33  0.012   0.251   0.553      -Inf      Inf       0.000      0.000
   83  15 -0.011  -0.226   0.534      -Inf      Inf       0.000      0.000
   84  98  0.011   0.235   0.305      -Inf      Inf       0.000      0.000
   85  72 -0.010  -0.205   0.119      -Inf    4.010       0.000      0.050
   86 100  0.008   0.161   0.797      -Inf      Inf       0.000      0.000
   87  81 -0.008  -0.161   0.107      -Inf    2.010       0.000      0.050
   88  24  0.007   0.135   0.771      -Inf      Inf       0.000      0.000
   89  10  0.006   0.123   0.440      -Inf      Inf       0.000      0.000
   90  60  0.006   0.120   0.592      -Inf      Inf       0.000      0.000
   91  75 -0.005  -0.113   0.153      -Inf    4.187       0.000      0.050
   92  16 -0.004  -0.086   0.629      -Inf      Inf       0.000      0.000
   93  54  0.004   0.078   0.996      -Inf   -5.340       0.000      0.004
   94  29  0.004   0.077   0.842      -Inf      Inf       0.000      0.000
   95  82  0.004   0.078   0.004     4.839      Inf       0.005      0.000
   96  80  0.000  -0.008   0.880      -Inf      Inf       0.000      0.000
   97  93  0.000   0.008   0.642      -Inf      Inf       0.000      0.000
   98  77  0.000  -0.007   0.185      -Inf      Inf       0.000      0.000
   99  55  0.000  -0.004   0.716      -Inf      Inf       0.000      0.000
  100  92  0.000  -0.003   0.443      -Inf      Inf       0.000      0.000

Estimated stopping point from ForwardStop rule = 2

Summary

In this example, selective inference selected variables 1 and 3, each with very small \(p\) values. Of course we know that variable 3 is a false selection, so it might seem bad that the p value is small. However you have to remember that the p value does not measure significance of the variable selection: it measures the significance of the coefficient of the selected variable, conditional on the selection event.

Put another way, selective inference is not really trying to assess uncertainty in which variables should be selected, and is certainly not trying to produce inferences of the form \[(b_1 \neq 0 \text{ OR } b_2 \neq 0) \text{ AND } (b_3 \neq 0 \text{ OR } b_4 \neq 0)\] which was the goal in Wang et al.

sessionInfo()

R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] selectiveInference_1.2.4 survival_2.44-1.1       
[3] intervals_0.15.1         glmnet_2.0-18           
[5] foreach_1.4.4            Matrix_1.2-17           

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1       knitr_1.23       whisker_0.3-2    magrittr_1.5    
 [5] workflowr_1.4.0  splines_3.6.0    lattice_0.20-38  stringr_1.4.0   
 [9] tools_3.6.0      grid_3.6.0       xfun_0.7         git2r_0.25.2    
[13] htmltools_0.3.6  iterators_1.0.10 yaml_2.2.0       rprojroot_1.3-2 
[17] digest_0.6.19    fs_1.3.1         codetools_0.2-16 glue_1.3.1      
[21] evaluate_0.14    rmarkdown_1.13   stringi_1.4.3    compiler_3.6.0  
[25] backports_1.1.4

selective_inference_toy

Matthew Stephens

2019-06-25

Introduction

Summary