Last updated: 2019-10-23
Checks: 2 0
Knit directory: mr-ash-workflow/
This reproducible R Markdown analysis was created with workflowr (version 1.3.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: ETA_1_lambda.dat
Ignored: ETA_1_parBayesB.dat
Ignored: analysis/ETA_1_lambda.dat
Ignored: analysis/ETA_1_parBayesB.dat
Ignored: analysis/mu.dat
Ignored: analysis/varE.dat
Ignored: mu.dat
Ignored: varE.dat
Untracked files:
Untracked: .DS_Store
Untracked: backup
Untracked: results/equicorr.RDS
Untracked: results/localcorr.RDS
Untracked: results/localcorr.REDS
Untracked: results/sparsesignal.RDS
Unstaged changes:
Modified: analysis/Result5_SparseSignal.Rmd
Modified: analysis/Result6_PVE.Rmd
Modified: code/sim_wrapper.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view them.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | a0a663d | Youngseok Kim | 2019-10-23 | wflow_publish(“analysis/index.Rmd”) |
html | bf5aa73 | Youngseok Kim | 2019-10-23 | Build site. |
Rmd | 56cc6bd | Youngseok Kim | 2019-10-23 | wflow_publish(“analysis/index.Rmd”) |
html | a0bc78b | Youngseok | 2019-10-22 | Build site. |
Rmd | a4de5b2 | Youngseok | 2019-10-22 | wflow_publish(“analysis/index.Rmd”) |
html | 668d700 | Youngseok | 2019-10-22 | Build site. |
Rmd | 4f8651a | Youngseok | 2019-10-22 | update changepoint analysis |
html | 7cf4984 | Youngseok | 2019-10-21 | Build site. |
Rmd | 6025590 | Youngseok | 2019-10-21 | wflow_publish(“analysis/index.Rmd”) |
html | 7d6c1c8 | Youngseok | 2019-10-21 | Build site. |
Rmd | 585044e | Youngseok Kim | 2019-10-21 | update |
Rmd | 3bab6f7 | Youngseok Kim | 2019-10-20 | update experiments |
html | 3bab6f7 | Youngseok Kim | 2019-10-20 | update experiments |
html | 92022af | Youngseok Kim | 2019-10-17 | Build site. |
Rmd | 897fd56 | Youngseok Kim | 2019-10-17 | update index.html |
html | 4d0a59b | Youngseok Kim | 2019-10-17 | Build site. |
Rmd | 6173f3e | Youngseok Kim | 2019-10-17 | update index.html |
html | 2ca8e0c | Youngseok Kim | 2019-10-17 | Build site. |
Rmd | 5d8193f | Youngseok Kim | 2019-10-17 | update index.html |
html | 79e1aab | Youngseok | 2019-10-17 | Build site. |
html | b091d16 | Youngseok | 2019-10-15 | Build site. |
Rmd | 5a648ed | Youngseok | 2019-10-15 | wflow_publish(“analysis/index.Rmd”) |
html | fd5131c | Youngseok | 2019-10-15 | Build site. |
Rmd | c0d6ee9 | Youngseok | 2019-10-15 | wflow_publish("analysis/*.Rmd") |
Rmd | 1e3484c | Youngseok | 2019-10-14 | update for experiment with different p |
html | d92cdd7 | Youngseok | 2019-10-14 | Build site. |
Rmd | 8dc7015 | Youngseok | 2019-10-14 | wflow_publish(“analysis/index.Rmd”) |
html | a7a31a3 | Youngseok | 2019-10-14 | Build site. |
Rmd | bee359c | Youngseok | 2019-10-14 | wflow_publish(“analysis/index.Rmd”) |
html | 8bdd1d7 | Youngseok | 2019-10-14 | Build site. |
Rmd | cb56b49 | Youngseok | 2019-10-14 | wflow_publish(“analysis/index.Rmd”) |
html | 70efed9 | Youngseok | 2019-10-14 | Build site. |
Rmd | 6167c5a | Youngseok | 2019-10-14 | wflow_publish("analysis/*.Rmd") |
html | ac5fb62 | Youngseok | 2019-10-13 | Build site. |
Rmd | 10c93e1 | Youngseok | 2019-10-13 | wflow_publish(“analysis/Index.Rmd”) |
html | bd36a79 | Youngseok | 2019-10-13 | Build site. |
Rmd | 2368046 | Youngseok | 2019-10-13 | wflow_publish("analysis/*.Rmd") |
Rmd | 1207fdd | Youngseok | 2019-10-07 | Start workflowr project. |
This vignette is to plot results for the experiment MR.ASH and the comparison methods listed below.
glmnet
R package: Ridge, Lasso, E-NETncvreg
R package: SCAD, MCPL0Learn
R package: L0LearnBGLR
R package: BayesB, Blasso (Bayesian Lasso)susieR
R package: SuSiE (Sum of Single Effect)varbvs
R package: VarBVS (Variational Bayes Variable Selection)The prediction error we define here is Pred.Err(ˆβ;ytest,Xtest)=RMSE(ˆβ;ytest−Xtest)σ=‖ where y_{\rm test} and X_{\rm test} are test data sample in the same way. If \hat\beta is fairly accurate, then we expect that \rm RMSE is similar to \sigma. Therefore in average \textrm{Pred.Err} \geq 1 and the smaller the better.
Here RMSE is the root-mean-squared error, and
Throughout the studies, the default setting is
IndepGauss
, i.e. X_{ij} \sim N(0,1)Signal shape: PointNormal
, i.e.
\beta_j = \left\{ \begin{array}{ll} \textrm{a normal random sample }\sim N(0,1), & j \in S \\
0, & j \notin S \end{array} \right.
where S is an index set of nonzero coefficients, sampled uniformly at random from \{1,\cdots,p\}. We define s = |S| is a sparsity level. If s is smaller, then we say the signal \beta is sparser.
var
. To this end, we set \sigma^2 = \textrm{Var}(X\beta).increasing
, i.e. (1,...,p) = 1:p
in this order.Initialization: null
i.e. a vector of zeros.
The ridge case is when the default setting applies to all but s = p. We observe that the relative performance is dependent on the value of p (compared to n). Thus we consider p = 50,100,200,500,1000,2000 while n = 500 being fixed.
The message here is MR.ASH performs reasonably well, and the only penalized linear regression method that outperforms Ridge with cross-validation. However, let us mention that Blasso (Bayesian Lasso) performs the best, which uses a non-sparse Laplace prior on \beta. Also, BayesB performs better than MR.ASH, which uses a spike-and-slab prior on \beta.
This experiment is to compare the flexibility of convex/non-convex shrinkage operators (E-NET, SCAD, MCP, L0Learn) vs MR.ASH shrinkage operator. The default setting applies to all but the signal shape and PVE = 0.99.
The reason why we set PVE = 0.99, the cross-validated solution of the method (E-NET, SCAD or MCP) acheives a nearly minimum prediction error on the solution path, i.e. the cross-validated solution is nearly the best the method can achieve. Thus we conclude that the cross-validation is fairly accurate.
This experiment is to compare performance of the methods when p varies while the rest being fixed.
(3-1) Result:
(3-2) Discussion: since the theory says (e.g. Slope meets Lasso: improved oracle bounds and optimality ) that the prediction error scales as \sqrt{s\log(p/s)/n}
A list of the update orders we consider is as follows.
random
: In each outer loop iteration, we sample a permuation map \xi:\{1,\cdots,p\} \to \{1,\cdots,p\} uniformly at random and run inner loop iterations with the order based on \xi.increasing
: (1,\cdots,p), i.e. in each outer loop iteration, we update q_1,q_2,\cdots,q_p in this order.lasso.pathorder
: we update q_j prior to q_{j'} when \beta_j appears earlier than \beta_{j'} in the lasso path. If ties occur, then increasing
order applies to those ties.scad.pathorder
: we update q_j prior to q_{j'} when \beta_j appears earlier than \beta_{j'} in the scad path. If ties occur, then increasing
order applies to those ties.lasso.absorder
: we update q_j prior to q_{j'} when the lasso solution satisfies |\hat\beta_j| \geq |\hat\beta_{j'}|. If ties occur, then increasing
order applies to those ties.univar.absorder
: we update q_j prior to q_{j'} when the univariate linear regression solution satisfies |\hat\beta_j| \geq |\hat\beta_{j'}|. If ties occur, then increasing
order applies to those ties.A list of the initializations we consider is as follows.
An analysis for the update rule of \sigma^2, based on different parametrizations of the model.
Our story is
the MR.ASH’s shrinkage operator (from a normal mixture prior) is more flexible than other popular shrinkage operators, which is a parametric form of \lambda (and \gamma if exists).
VEB can do better than cross validation.