Processing math: 0%

Last updated: 2019-10-23

Checks: 2 0

Knit directory: mr-ash-workflow/

This reproducible R Markdown analysis was created with workflowr (version 1.3.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    ETA_1_lambda.dat
    Ignored:    ETA_1_parBayesB.dat
    Ignored:    analysis/ETA_1_lambda.dat
    Ignored:    analysis/ETA_1_parBayesB.dat
    Ignored:    analysis/mu.dat
    Ignored:    analysis/varE.dat
    Ignored:    mu.dat
    Ignored:    varE.dat

Untracked files:
    Untracked:  .DS_Store
    Untracked:  results/design.RDS

Unstaged changes:
    Modified:   analysis/Result10_Correlation.Rmd
    Modified:   analysis/Result13_LocalCorrelation.Rmd
    Modified:   analysis/Result7_LowdimIndepGauss.Rmd
    Modified:   analysis/Result8_RealGenotype.Rmd
    Modified:   results/subogdancandes.RDS

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd 9038d27 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html 61571b0 Youngseok Kim 2019-10-23 Build site.
Rmd 57610d4 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html dd02fcc Youngseok Kim 2019-10-23 Build site.
Rmd 75d105c Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html 32d0091 Youngseok Kim 2019-10-23 Build site.
Rmd 2c0dc4f Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html c180256 Youngseok Kim 2019-10-23 Build site.
Rmd dccf722 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html b8d5cb6 Youngseok Kim 2019-10-23 Build site.
Rmd 990045d Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html 4484e99 Youngseok Kim 2019-10-23 Build site.
Rmd 67178ce Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html 46a6d53 Youngseok Kim 2019-10-23 Build site.
Rmd 4c49871 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html 1a67418 Youngseok Kim 2019-10-23 Build site.
Rmd 84b1fd6 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html 782d931 Youngseok Kim 2019-10-23 Build site.
Rmd 9ea889c Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html a9435be Youngseok Kim 2019-10-23 Build site.
Rmd 859bd6c Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html c861138 Youngseok Kim 2019-10-23 Build site.
Rmd e9e17ff Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html 1458fd6 Youngseok Kim 2019-10-23 Build site.
Rmd 5f6d6e3 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html 1537db4 Youngseok Kim 2019-10-23 Build site.
Rmd 118da06 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html 2ebf6f7 Youngseok Kim 2019-10-23 Build site.
Rmd d8f6a69 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html 785909d Youngseok Kim 2019-10-23 Build site.
Rmd 1665384 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html 93b9710 Youngseok Kim 2019-10-23 Build site.
Rmd 9569eb7 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html 07742e3 Youngseok Kim 2019-10-23 Build site.
Rmd ebac94a Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html 0aec516 Youngseok Kim 2019-10-23 Build site.
Rmd b2eb7c2 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html b610554 Youngseok Kim 2019-10-23 Build site.
Rmd 23b1ac2 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html d63eb8c Youngseok Kim 2019-10-23 Build site.
Rmd a78ca10 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html dbd8cca Youngseok Kim 2019-10-23 Build site.
html 1adb699 Youngseok Kim 2019-10-23 Build site.
html 3d22ee3 Youngseok Kim 2019-10-23 Build site.
Rmd 49d71a9 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html c584369 Youngseok Kim 2019-10-23 Build site.
Rmd c7f8752 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html 5eeac95 Youngseok Kim 2019-10-23 Build site.
Rmd a7cde52 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html b835c0f Youngseok Kim 2019-10-23 Build site.
Rmd 9fec1c9 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html 554bd5d Youngseok Kim 2019-10-23 Build site.
Rmd 9f32d59 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html 75a4f55 Youngseok Kim 2019-10-23 Build site.
Rmd 6fa74a6 Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html f669a52 Youngseok Kim 2019-10-23 Build site.
Rmd a36da9c Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html cc6df17 Youngseok Kim 2019-10-23 Build site.
Rmd a0a663d Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html bf5aa73 Youngseok Kim 2019-10-23 Build site.
Rmd 56cc6bd Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html a0bc78b Youngseok 2019-10-22 Build site.
Rmd a4de5b2 Youngseok 2019-10-22 wflow_publish(“analysis/index.Rmd”)
html 668d700 Youngseok 2019-10-22 Build site.
Rmd 4f8651a Youngseok 2019-10-22 update changepoint analysis
html 7cf4984 Youngseok 2019-10-21 Build site.
Rmd 6025590 Youngseok 2019-10-21 wflow_publish(“analysis/index.Rmd”)
html 7d6c1c8 Youngseok 2019-10-21 Build site.
Rmd 585044e Youngseok Kim 2019-10-21 update
Rmd 3bab6f7 Youngseok Kim 2019-10-20 update experiments
html 3bab6f7 Youngseok Kim 2019-10-20 update experiments
html 92022af Youngseok Kim 2019-10-17 Build site.
Rmd 897fd56 Youngseok Kim 2019-10-17 update index.html
html 4d0a59b Youngseok Kim 2019-10-17 Build site.
Rmd 6173f3e Youngseok Kim 2019-10-17 update index.html
html 2ca8e0c Youngseok Kim 2019-10-17 Build site.
Rmd 5d8193f Youngseok Kim 2019-10-17 update index.html
html 79e1aab Youngseok 2019-10-17 Build site.
html b091d16 Youngseok 2019-10-15 Build site.
Rmd 5a648ed Youngseok 2019-10-15 wflow_publish(“analysis/index.Rmd”)
html fd5131c Youngseok 2019-10-15 Build site.
Rmd c0d6ee9 Youngseok 2019-10-15 wflow_publish("analysis/*.Rmd")
Rmd 1e3484c Youngseok 2019-10-14 update for experiment with different p
html d92cdd7 Youngseok 2019-10-14 Build site.
Rmd 8dc7015 Youngseok 2019-10-14 wflow_publish(“analysis/index.Rmd”)
html a7a31a3 Youngseok 2019-10-14 Build site.
Rmd bee359c Youngseok 2019-10-14 wflow_publish(“analysis/index.Rmd”)
html 8bdd1d7 Youngseok 2019-10-14 Build site.
Rmd cb56b49 Youngseok 2019-10-14 wflow_publish(“analysis/index.Rmd”)
html 70efed9 Youngseok 2019-10-14 Build site.
Rmd 6167c5a Youngseok 2019-10-14 wflow_publish("analysis/*.Rmd")
html ac5fb62 Youngseok 2019-10-13 Build site.
Rmd 10c93e1 Youngseok 2019-10-13 wflow_publish(“analysis/Index.Rmd”)
html bd36a79 Youngseok 2019-10-13 Build site.
Rmd 2368046 Youngseok 2019-10-13 wflow_publish("analysis/*.Rmd")
Rmd 1207fdd Youngseok 2019-10-07 Start workflowr project.

Plots for the paper

Plots


Results for external comparison

This vignette is to plot results for the experiment MR.ASH and the comparison methods listed below.

  1. glmnet R package: Ridge, Lasso, E-NET
  2. ncvreg R package: SCAD, MCP
  3. L0Learn R package: L0Learn
  4. BGLR R package: BayesB, Blasso (Bayesian Lasso)
  5. susieR R package: SuSiE (Sum of Single Effect)
  6. varbvs R package: VarBVS (Variational Bayes Variable Selection)

Performance measure

The prediction error we define here is Pred.Err(ˆβ;ytest,Xtest)=RMSE(ˆβ;ytestXtest)σ= where y_{\rm test} and X_{\rm test} are test data sample in the same way. If \hat\beta is fairly accurate, then we expect that \rm RMSE is similar to \sigma. Therefore in average \textrm{Pred.Err} \geq 1 and the smaller the better.

Here RMSE is the root-mean-squared error, and

Default setting

Throughout the studies, the default setting is

  1. The number of samples n = 500 and the number of coefficients p = 2000.
  2. The number of nonzero coefficients s = 20 (sparsity)
  3. Design: IndepGauss, i.e. X_{ij} \sim N(0,1)
  4. Signal shape: PointNormal, i.e. \beta_j = \left\{ \begin{array}{ll} \textrm{a normal random sample }\sim N(0,1), & j \in S \\ 0, & j \notin S \end{array} \right. where S is an index set of nonzero coefficients, sampled uniformly at random from \{1,\cdots,p\}. We define s = |S| is a sparsity level. If s is smaller, then we say the signal \beta is sparser.

  5. PVE (Proportion of Variance Explained): 0.5 Given X and \beta, we sample y = X\beta + \epsilon, where \epsilon \sim N(0,\sigma^2 I_n). In this case, PVE is defined by {\rm PVE} = \frac{\textrm{Var}(X\beta)}{\textrm{Var}(X\beta) + \sigma^2}, where \textrm{Var}(a) denotes the sample variance of a calculated using R function var. To this end, we set \sigma^2 = \textrm{Var}(X\beta).
  6. Update order: increasing, i.e. (1,...,p) = 1:p in this order.
  7. Initialization: null i.e. a vector of zeros.

MR.ASH and the comparison methods

Some special cases

  1. Ridge case.

The ridge case is when the default setting applies to all but s = p. We observe that the relative performance is dependent on the value of p (compared to n). Thus we consider p = 50,100,200,500,1000,2000 while n = 500 being fixed.

  • Result:

  • Discussion: The message here is MR.ASH performs reasonably well, and the only penalized linear regression method that outperforms Ridge with cross-validation. However, let us mention that Blasso (Bayesian Lasso) performs the best, which uses a non-sparse Laplace prior on \beta. Also, BayesB performs better than MR.ASH, which uses a spike-and-slab prior on \beta.
  1. Nonconvex case.

This experiment is to compare the flexibility of convex/non-convex shrinkage operators (E-NET, SCAD, MCP, L0Learn) vs MR.ASH shrinkage operator. The default setting applies to all but the signal shape and PVE = 0.99.

The reason why we set PVE = 0.99, the cross-validated solution of the method (E-NET, SCAD or MCP) acheives a nearly minimum prediction error on the solution path, i.e. the cross-validated solution is nearly the best the method can achieve. Thus we conclude that the cross-validation is fairly accurate. We have checked this by experiment.

  • Result:

  • Discussion: Clearly MR.ASH is an overall winner, outperforming all the other methods. Flexibility of MR.ASH Gaussian mixture prior is illustrated well in the above figure.

Important parameters for high-dimensional regression settings

  1. High-dimensionality p.

This experiment is to compare performance of the methods when p varies while the rest being fixed.

  • Result:

  • Discussion: It shows that MR.ASH and VarBVS outperform well. Since the theory says (e.g. Slope meets Lasso: improved oracle bounds and optimality ) that the prediction error scales as \sqrt{s\log(p/s)/n}, and thus we can expect that the prediction error increaese log-linearly in p. One can check this in the above figure.

Also, the computation time of MR.ASH scales linearly in \log p. We conclude that BayesB and Blasso do not scale well for large p.

  1. Sparsity s.
  • Result:

  • Discussion: Mr.ASH does well in sparse setting. VARBVS shows very similar performance compared to Mr.ASH. BayesB performs reasonably well for all cases, hence very balanced. Blasso does very well in non-sparse setting.
Method Mr.ASH VARBVS BayesB Blasso Susie E-NET Lasso Ridge SCAD MCP L0Learn
s = 1 1.01 1.01 1.01 1.06 1.01 1.03 1.04 1.40 1.01 1.01 1.02
s = 5 1.02 1.02 1.02 1.11 1.02 1.08 1.08 1.43 1.03 1.02 1.03
s = 20 1.06 1.06 1.07 1.20 1.09 1.13 1.14 1.41 1.08 1.07 1.07
s = 100 1.28 1.28 1.28 1.33 1.35 1.30 1.31 1.39 1.27 1.27 1.34
s = 500 1.38 1.39 1.37 1.37 1.40 1.40 1.40 1.40 1.40 1.40 1.43
s = 2000 1.42 1.43 1.40 1.39 1.43 1.43 1.44 1.43 1.44 1.44 1.45
Note sparse sparse balanced non-sparse sparse sparse
  1. PVE.
  • Result:

  • Discussion: MR.ASH outperforms for all the PVEs. However this experiment setting is favorable for MR.ASH. Thus we can just conclude, for safety, that the relative performance does not depend very much on PVE, but Ridge, SuSiE and Blasso suffer for high PVE.

Performance on different design settings

  1. LowdimIndepGauss

  2. RealGenotype

  3. Changepoint

  4. EquicorrGauss

Method Mr.ASH Mr.ASH.order MR.ASH.init VARBVS BayesB Blasso SuSiE E-NET Lasso Ridge SCAD MCP L0Learn
LowdimIndepGauss + SuBogdanCandes 1.012 1.012 1.012 1.012 2.315 46.19 1.012 94.60 96.82 2680 1.012 1.012 1.230
RealGenotype + PointNormal 1.160 1.145 1.154 1.177 1.258 1.189 1.157 1.180 1.199 1.254 1.154 1.162 1.194
EquiCorrGauss + PointNormal 1.119 1.126 1.080 1.142 1.219 1.144 1.154 1.097 1.103 1.189 1.078 1.080 1.101
LocalCorrGauss + PointNormal 1.092 1.083 1.076 1.096 1.144 1.193 1.098 1.119 1.128 1.270 1.078 1.076 1.082
Changepoint + BigSuddenChange 2.885 1.450 1.103 5.086 1.890 3.636 9.518 3.991 5.320 9.810 1.765 1.956 2.346
Changepoint + OneSmallChange 2.195 1.168 1.272 1.742 8.864 4.442 1.203 3.776 4.255 6.622 1.478 2.246 3.004
Computation time 2.218 3.166 5.944 3.445 3.583 4.001 1.808 51.50 4.670 5.567 7.538 6.884 10.95

Results for internal comparison

MR.ASH update order

A list of the update orders we consider is as follows.

  1. random: In each outer loop iteration, we sample a permuation map \xi:\{1,\cdots,p\} \to \{1,\cdots,p\} uniformly at random and run inner loop iterations with the order based on \xi.
  2. increasing: (1,\cdots,p), i.e. in each outer loop iteration, we update q_1,q_2,\cdots,q_p in this order.
  3. lasso.pathorder: we update q_j prior to q_{j'} when \beta_j appears earlier than \beta_{j'} in the lasso path. If ties occur, then increasing order applies to those ties.
  4. scad.pathorder: we update q_j prior to q_{j'} when \beta_j appears earlier than \beta_{j'} in the scad path. If ties occur, then increasing order applies to those ties.
  5. lasso.absorder: we update q_j prior to q_{j'} when the lasso solution satisfies |\hat\beta_j| \geq |\hat\beta_{j'}|. If ties occur, then increasing order applies to those ties.
  6. univar.absorder: we update q_j prior to q_{j'} when the univariate linear regression solution satisfies |\hat\beta_j| \geq |\hat\beta_{j'}|. If ties occur, then increasing order applies to those ties.

MR.ASH initialization

A list of the initializations we consider is as follows.

Parametrization

An analysis for the update rule of \sigma^2, based on different parametrizations of the model.


Our story is

  1. the MR.ASH’s shrinkage operator (from a normal mixture prior) is more flexible than other popular shrinkage operators, which is a parametric form of \lambda (and \gamma if exists).

  2. VEB can do better than cross validation.


Brief Summary of Comparison Methods

L0Learn

L0Learn R package provides a fast coordinate descent algorithm for the best subset regression.

Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

SCAD, MCP

ncvreg R package provides a fast coordinate descent algorithm for the non-convex penalized linear regression method with well-known penalty functions SCAD and MCP.

Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection