Home

Last updated: 2019-10-23

Checks: 2 0

Knit directory: mr-ash-workflow/

This reproducible R Markdown analysis was created with workflowr (version 1.3.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Repository version: a0a663d

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    ETA_1_lambda.dat
    Ignored:    ETA_1_parBayesB.dat
    Ignored:    analysis/ETA_1_lambda.dat
    Ignored:    analysis/ETA_1_parBayesB.dat
    Ignored:    analysis/mu.dat
    Ignored:    analysis/varE.dat
    Ignored:    mu.dat
    Ignored:    varE.dat

Untracked files:
    Untracked:  .DS_Store
    Untracked:  backup
    Untracked:  results/equicorr.RDS
    Untracked:  results/localcorr.RDS
    Untracked:  results/localcorr.REDS
    Untracked:  results/sparsesignal.RDS

Unstaged changes:
    Modified:   analysis/Result5_SparseSignal.Rmd
    Modified:   analysis/Result6_PVE.Rmd
    Modified:   code/sim_wrapper.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File	Version	Author	Date	Message
Rmd	a0a663d	Youngseok Kim	2019-10-23	wflow_publish(“analysis/index.Rmd”)
html	bf5aa73	Youngseok Kim	2019-10-23	Build site.
Rmd	56cc6bd	Youngseok Kim	2019-10-23	wflow_publish(“analysis/index.Rmd”)
html	a0bc78b	Youngseok	2019-10-22	Build site.
Rmd	a4de5b2	Youngseok	2019-10-22	wflow_publish(“analysis/index.Rmd”)
html	668d700	Youngseok	2019-10-22	Build site.
Rmd	4f8651a	Youngseok	2019-10-22	update changepoint analysis
html	7cf4984	Youngseok	2019-10-21	Build site.
Rmd	6025590	Youngseok	2019-10-21	wflow_publish(“analysis/index.Rmd”)
html	7d6c1c8	Youngseok	2019-10-21	Build site.
Rmd	585044e	Youngseok Kim	2019-10-21	update
Rmd	3bab6f7	Youngseok Kim	2019-10-20	update experiments
html	3bab6f7	Youngseok Kim	2019-10-20	update experiments
html	92022af	Youngseok Kim	2019-10-17	Build site.
Rmd	897fd56	Youngseok Kim	2019-10-17	update index.html
html	4d0a59b	Youngseok Kim	2019-10-17	Build site.
Rmd	6173f3e	Youngseok Kim	2019-10-17	update index.html
html	2ca8e0c	Youngseok Kim	2019-10-17	Build site.
Rmd	5d8193f	Youngseok Kim	2019-10-17	update index.html
html	79e1aab	Youngseok	2019-10-17	Build site.
html	b091d16	Youngseok	2019-10-15	Build site.
Rmd	5a648ed	Youngseok	2019-10-15	wflow_publish(“analysis/index.Rmd”)
html	fd5131c	Youngseok	2019-10-15	Build site.
Rmd	c0d6ee9	Youngseok	2019-10-15	wflow_publish("analysis/*.Rmd")
Rmd	1e3484c	Youngseok	2019-10-14	update for experiment with different p
html	d92cdd7	Youngseok	2019-10-14	Build site.
Rmd	8dc7015	Youngseok	2019-10-14	wflow_publish(“analysis/index.Rmd”)
html	a7a31a3	Youngseok	2019-10-14	Build site.
Rmd	bee359c	Youngseok	2019-10-14	wflow_publish(“analysis/index.Rmd”)
html	8bdd1d7	Youngseok	2019-10-14	Build site.
Rmd	cb56b49	Youngseok	2019-10-14	wflow_publish(“analysis/index.Rmd”)
html	70efed9	Youngseok	2019-10-14	Build site.
Rmd	6167c5a	Youngseok	2019-10-14	wflow_publish("analysis/*.Rmd")
html	ac5fb62	Youngseok	2019-10-13	Build site.
Rmd	10c93e1	Youngseok	2019-10-13	wflow_publish(“analysis/Index.Rmd”)
html	bd36a79	Youngseok	2019-10-13	Build site.
Rmd	2368046	Youngseok	2019-10-13	wflow_publish("analysis/*.Rmd")
Rmd	1207fdd	Youngseok	2019-10-07	Start workflowr project.

Plots for the paper

Plots

Results for external comparison

This vignette is to plot results for the experiment MR.ASH and the comparison methods listed below.

glmnet R package: Ridge, Lasso, E-NET
ncvreg R package: SCAD, MCP
L0Learn R package: L0Learn
BGLR R package: BayesB, Blasso (Bayesian Lasso)
susieR R package: SuSiE (Sum of Single Effect)
varbvs R package: VarBVS (Variational Bayes Variable Selection)

Performance measure

The prediction error we define here is $\textrm{Pred.Err}(\hat\beta;y_{\rm test}, X_{\rm test}) = \frac{\textrm{RMSE}(\hat\beta;y_{\rm test} - X_{\rm test})}{\sigma} = \frac{\|y_{\rm test} - X_{\rm test} \hat\beta \|}{\sqrt{n}\sigma}$ where $y_{\rm test}$ and $X_{\rm test}$ are test data sample in the same way. If $\hat\beta$ is fairly accurate, then we expect that $\rm RMSE$ is similar to $\sigma$ . Therefore in average $\textrm{Pred.Err} \geq 1$ and the smaller the better.

Here RMSE is the root-mean-squared error, and

Default setting

Throughout the studies, the default setting is

The number of samples $n = 500$ and the number of coefficients $p = 2000$ .
The number of nonzero coefficients $s = 20$ (sparsity)
Design: IndepGauss, i.e. $X_{ij} \sim N(0,1)$
Signal shape: PointNormal, i.e. $\beta_j = \left\{ \begin{array}{ll} \textrm{a normal random sample }\sim N(0,1), & j \in S \\ 0, & j \notin S \end{array} \right.$ where $S$ is an index set of nonzero coefficients, sampled uniformly at random from $\{1,\cdots,p\}$ . We define $s = |S|$ is a sparsity level. If $s$ is smaller, then we say the signal $\beta$ is sparser.
PVE (Proportion of Variance Explained): 0.5 Given $X$ and $\beta$ , we sample $y = X\beta + \epsilon$ , where $\epsilon \sim N(0,\sigma^2 I_n)$ . In this case, PVE is defined by ${\rm PVE} = \frac{\textrm{Var}(X\beta)}{\textrm{Var}(X\beta) + \sigma^2},$ where $\textrm{Var}(a)$ denotes the sample variance of $a$ calculated using R function var. To this end, we set $\sigma^2 = \textrm{Var}(X\beta)$ .
Update order: increasing, i.e. (1,...,p) = 1:p in this order.
Initialization: null i.e. a vector of zeros.

MR.ASH and the comparison methods

Some special cases

Ridge case.

The ridge case is when the default setting applies to all but $s = p$ . We observe that the relative performance is dependent on the value of $p$ (compared to $n$ ). Thus we consider $p = 50,100,200,500,1000,2000$ while $n = 500$ being fixed.

The message here is MR.ASH performs reasonably well, and the only penalized linear regression method that outperforms Ridge with cross-validation. However, let us mention that Blasso (Bayesian Lasso) performs the best, which uses a non-sparse Laplace prior on $\beta$ . Also, BayesB performs better than MR.ASH, which uses a spike-and-slab prior on $\beta$ .

Nonconvex case.

This experiment is to compare the flexibility of convex/non-convex shrinkage operators (E-NET, SCAD, MCP, L0Learn) vs MR.ASH shrinkage operator. The default setting applies to all but the signal shape and PVE = 0.99.

The reason why we set PVE = 0.99, the cross-validated solution of the method (E-NET, SCAD or MCP) acheives a nearly minimum prediction error on the solution path, i.e. the cross-validated solution is nearly the best the method can achieve. Thus we conclude that the cross-validation is fairly accurate.

Important parameters for high-dimensional regression settings

High-dimensionality $p$ .

This experiment is to compare performance of the methods when $p$ varies while the rest being fixed.

(3-1) Result:

(3-2) Discussion: since the theory says (e.g. Slope meets Lasso: improved oracle bounds and optimality ) that the prediction error scales as $\sqrt{s\log(p/s)/n}$

Sparsity $s$ .
PVE.

Performance on different design settings

Results for internal comparison

MR.ASH update order

A list of the update orders we consider is as follows.

random: In each outer loop iteration, we sample a permuation map $\xi:\{1,\cdots,p\} \to \{1,\cdots,p\}$ uniformly at random and run inner loop iterations with the order based on $\xi$ .
increasing: $(1,\cdots,p)$ , i.e. in each outer loop iteration, we update $q_1,q_2,\cdots,q_p$ in this order.
lasso.pathorder: we update $q_j$ prior to $q_{j'}$ when $\beta_j$ appears earlier than $\beta_{j'}$ in the lasso path. If ties occur, then increasing order applies to those ties.
scad.pathorder: we update $q_j$ prior to $q_{j'}$ when $\beta_j$ appears earlier than $\beta_{j'}$ in the scad path. If ties occur, then increasing order applies to those ties.
lasso.absorder: we update $q_j$ prior to $q_{j'}$ when the lasso solution satisfies $|\hat\beta_j| \geq |\hat\beta_{j'}|$ . If ties occur, then increasing order applies to those ties.
univar.absorder: we update $q_j$ prior to $q_{j'}$ when the univariate linear regression solution satisfies $|\hat\beta_j| \geq |\hat\beta_{j'}|$ . If ties occur, then increasing order applies to those ties.

MR.ASH initialization

A list of the initializations we consider is as follows.

Parametrization

An analysis for the update rule of $\sigma^2$ , based on different parametrizations of the model.

Our story is

the MR.ASH’s shrinkage operator (from a normal mixture prior) is more flexible than other popular shrinkage operators, which is a parametric form of $\lambda$ (and $\gamma$ if exists).
VEB can do better than cross validation.