Processing math: 0%

Last updated: 2019-10-23

Checks: 2 0

Knit directory: mr-ash-workflow/

This reproducible R Markdown analysis was created with workflowr (version 1.3.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    ETA_1_lambda.dat
    Ignored:    ETA_1_parBayesB.dat
    Ignored:    analysis/ETA_1_lambda.dat
    Ignored:    analysis/ETA_1_parBayesB.dat
    Ignored:    analysis/mu.dat
    Ignored:    analysis/varE.dat
    Ignored:    mu.dat
    Ignored:    varE.dat

Untracked files:
    Untracked:  .DS_Store
    Untracked:  backup
    Untracked:  results/equicorr.RDS
    Untracked:  results/localcorr.RDS
    Untracked:  results/localcorr.REDS
    Untracked:  results/sparsesignal.RDS

Unstaged changes:
    Modified:   analysis/Result5_SparseSignal.Rmd
    Modified:   analysis/Result6_PVE.Rmd
    Modified:   code/sim_wrapper.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd a0a663d Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html bf5aa73 Youngseok Kim 2019-10-23 Build site.
Rmd 56cc6bd Youngseok Kim 2019-10-23 wflow_publish(“analysis/index.Rmd”)
html a0bc78b Youngseok 2019-10-22 Build site.
Rmd a4de5b2 Youngseok 2019-10-22 wflow_publish(“analysis/index.Rmd”)
html 668d700 Youngseok 2019-10-22 Build site.
Rmd 4f8651a Youngseok 2019-10-22 update changepoint analysis
html 7cf4984 Youngseok 2019-10-21 Build site.
Rmd 6025590 Youngseok 2019-10-21 wflow_publish(“analysis/index.Rmd”)
html 7d6c1c8 Youngseok 2019-10-21 Build site.
Rmd 585044e Youngseok Kim 2019-10-21 update
Rmd 3bab6f7 Youngseok Kim 2019-10-20 update experiments
html 3bab6f7 Youngseok Kim 2019-10-20 update experiments
html 92022af Youngseok Kim 2019-10-17 Build site.
Rmd 897fd56 Youngseok Kim 2019-10-17 update index.html
html 4d0a59b Youngseok Kim 2019-10-17 Build site.
Rmd 6173f3e Youngseok Kim 2019-10-17 update index.html
html 2ca8e0c Youngseok Kim 2019-10-17 Build site.
Rmd 5d8193f Youngseok Kim 2019-10-17 update index.html
html 79e1aab Youngseok 2019-10-17 Build site.
html b091d16 Youngseok 2019-10-15 Build site.
Rmd 5a648ed Youngseok 2019-10-15 wflow_publish(“analysis/index.Rmd”)
html fd5131c Youngseok 2019-10-15 Build site.
Rmd c0d6ee9 Youngseok 2019-10-15 wflow_publish("analysis/*.Rmd")
Rmd 1e3484c Youngseok 2019-10-14 update for experiment with different p
html d92cdd7 Youngseok 2019-10-14 Build site.
Rmd 8dc7015 Youngseok 2019-10-14 wflow_publish(“analysis/index.Rmd”)
html a7a31a3 Youngseok 2019-10-14 Build site.
Rmd bee359c Youngseok 2019-10-14 wflow_publish(“analysis/index.Rmd”)
html 8bdd1d7 Youngseok 2019-10-14 Build site.
Rmd cb56b49 Youngseok 2019-10-14 wflow_publish(“analysis/index.Rmd”)
html 70efed9 Youngseok 2019-10-14 Build site.
Rmd 6167c5a Youngseok 2019-10-14 wflow_publish("analysis/*.Rmd")
html ac5fb62 Youngseok 2019-10-13 Build site.
Rmd 10c93e1 Youngseok 2019-10-13 wflow_publish(“analysis/Index.Rmd”)
html bd36a79 Youngseok 2019-10-13 Build site.
Rmd 2368046 Youngseok 2019-10-13 wflow_publish("analysis/*.Rmd")
Rmd 1207fdd Youngseok 2019-10-07 Start workflowr project.

Plots for the paper

Plots


Results for external comparison

This vignette is to plot results for the experiment MR.ASH and the comparison methods listed below.

  1. glmnet R package: Ridge, Lasso, E-NET
  2. ncvreg R package: SCAD, MCP
  3. L0Learn R package: L0Learn
  4. BGLR R package: BayesB, Blasso (Bayesian Lasso)
  5. susieR R package: SuSiE (Sum of Single Effect)
  6. varbvs R package: VarBVS (Variational Bayes Variable Selection)

Performance measure

The prediction error we define here is Pred.Err(ˆβ;ytest,Xtest)=RMSE(ˆβ;ytestXtest)σ= where y_{\rm test} and X_{\rm test} are test data sample in the same way. If \hat\beta is fairly accurate, then we expect that \rm RMSE is similar to \sigma. Therefore in average \textrm{Pred.Err} \geq 1 and the smaller the better.

Here RMSE is the root-mean-squared error, and

Default setting

Throughout the studies, the default setting is

  1. The number of samples n = 500 and the number of coefficients p = 2000.
  2. The number of nonzero coefficients s = 20 (sparsity)
  3. Design: IndepGauss, i.e. X_{ij} \sim N(0,1)
  4. Signal shape: PointNormal, i.e. \beta_j = \left\{ \begin{array}{ll} \textrm{a normal random sample }\sim N(0,1), & j \in S \\ 0, & j \notin S \end{array} \right. where S is an index set of nonzero coefficients, sampled uniformly at random from \{1,\cdots,p\}. We define s = |S| is a sparsity level. If s is smaller, then we say the signal \beta is sparser.

  5. PVE (Proportion of Variance Explained): 0.5 Given X and \beta, we sample y = X\beta + \epsilon, where \epsilon \sim N(0,\sigma^2 I_n). In this case, PVE is defined by {\rm PVE} = \frac{\textrm{Var}(X\beta)}{\textrm{Var}(X\beta) + \sigma^2}, where \textrm{Var}(a) denotes the sample variance of a calculated using R function var. To this end, we set \sigma^2 = \textrm{Var}(X\beta).
  6. Update order: increasing, i.e. (1,...,p) = 1:p in this order.
  7. Initialization: null i.e. a vector of zeros.

MR.ASH and the comparison methods

Some special cases

  1. Ridge case.

The ridge case is when the default setting applies to all but s = p. We observe that the relative performance is dependent on the value of p (compared to n). Thus we consider p = 50,100,200,500,1000,2000 while n = 500 being fixed.

The message here is MR.ASH performs reasonably well, and the only penalized linear regression method that outperforms Ridge with cross-validation. However, let us mention that Blasso (Bayesian Lasso) performs the best, which uses a non-sparse Laplace prior on \beta. Also, BayesB performs better than MR.ASH, which uses a spike-and-slab prior on \beta.

  1. Nonconvex case.

This experiment is to compare the flexibility of convex/non-convex shrinkage operators (E-NET, SCAD, MCP, L0Learn) vs MR.ASH shrinkage operator. The default setting applies to all but the signal shape and PVE = 0.99.

The reason why we set PVE = 0.99, the cross-validated solution of the method (E-NET, SCAD or MCP) acheives a nearly minimum prediction error on the solution path, i.e. the cross-validated solution is nearly the best the method can achieve. Thus we conclude that the cross-validation is fairly accurate.

Important parameters for high-dimensional regression settings

  1. High-dimensionality p.

This experiment is to compare performance of the methods when p varies while the rest being fixed.

(3-1) Result:

(3-2) Discussion: since the theory says (e.g. Slope meets Lasso: improved oracle bounds and optimality ) that the prediction error scales as \sqrt{s\log(p/s)/n}

  1. Sparsity s.

  2. PVE.

Performance on different design settings

  1. LowdimIndepGauss

  2. RealGenotype

  3. Changepoint

  4. EquicorrGauss


Results for internal comparison

MR.ASH update order

A list of the update orders we consider is as follows.

  1. random: In each outer loop iteration, we sample a permuation map \xi:\{1,\cdots,p\} \to \{1,\cdots,p\} uniformly at random and run inner loop iterations with the order based on \xi.
  2. increasing: (1,\cdots,p), i.e. in each outer loop iteration, we update q_1,q_2,\cdots,q_p in this order.
  3. lasso.pathorder: we update q_j prior to q_{j'} when \beta_j appears earlier than \beta_{j'} in the lasso path. If ties occur, then increasing order applies to those ties.
  4. scad.pathorder: we update q_j prior to q_{j'} when \beta_j appears earlier than \beta_{j'} in the scad path. If ties occur, then increasing order applies to those ties.
  5. lasso.absorder: we update q_j prior to q_{j'} when the lasso solution satisfies |\hat\beta_j| \geq |\hat\beta_{j'}|. If ties occur, then increasing order applies to those ties.
  6. univar.absorder: we update q_j prior to q_{j'} when the univariate linear regression solution satisfies |\hat\beta_j| \geq |\hat\beta_{j'}|. If ties occur, then increasing order applies to those ties.

MR.ASH initialization

A list of the initializations we consider is as follows.

Parametrization

An analysis for the update rule of \sigma^2, based on different parametrizations of the model.


Our story is

  1. the MR.ASH’s shrinkage operator (from a normal mixture prior) is more flexible than other popular shrinkage operators, which is a parametric form of \lambda (and \gamma if exists).

  2. VEB can do better than cross validation.