In this short analysis, we compare the prediction accuracy of several linear regression in the four simulation examples described in Zou & Hastie (2005). The methods compared are:
ridge regression;
the Lasso;
the Elastic Net;
Sum of Single Effects regression (SuSiE), described here;
variational inference for Bayesian variable selection, or “varbvs”, described here; and
“varbvsmix”, an elaboration of varbvs that replaces the single normal prior with a mixture-of-normals.
Add a bit more text here.
Load a few packages and custom functions used in the analysis below.
library(dscrutils)
library(ggplot2)
library(cowplot)
source("../code/plots.R")
Here we use function “dscquery” from the dscrutils package to extract the results of the DSC we are interested in—the mean squared error in the predictions from each method and in each simulation scenario. The “dsc” data frame should contain results for 480 pipelines—6 methods times 4 scenarios times 20 data sets simulated in each scenario.
library(dscrutils)
methods <- c("ridge","lasso","elastic_net","susie","varbvs","varbvsmix")
dsc <- dscquery("../dsc/linreg",c("simulate.scenario","fit","mse.err"),
verbose = FALSE)
dsc <- transform(dsc,fit = factor(fit,methods))
nrow(dsc)
# [1] 480
If you did not run the DSC, you can replace the dscquery call above by this line:
dsc <- read.csv("../output/linreg_mse.csv")
Let’s save this table to a CSV file in case it is useful later.
write.csv(dsc,"../output/linreg_mse.csv",row.names = FALSE,quote = FALSE)
Add text here.
p1 <- mse.boxplot(subset(out,simulate.scenario == 1))
p2 <- mse.boxplot(subset(out,simulate.scenario == 2))
p3 <- mse.boxplot(subset(out,simulate.scenario == 3))
p4 <- mse.boxplot(subset(out,simulate.scenario == 4))
plot_grid(p1,p2,p3,p4,labels = paste("Simulation",1:4),vjust = 0)
This is the version of R and the packages that were used to generate these results.
sessionInfo()
# R version 3.4.3 (2017-11-30)
# Platform: x86_64-apple-darwin15.6.0 (64-bit)
# Running under: macOS High Sierra 10.13.6
#
# Matrix products: default
# BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
#
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] cowplot_0.9.4 ggplot2_3.1.0 dscrutils_0.3.5 rmarkdown_1.10
#
# loaded via a namespace (and not attached):
# [1] Rcpp_1.0.0 knitr_1.20 magrittr_1.5 tidyselect_0.2.5
# [5] munsell_0.4.3 colorspace_1.4-0 R6_2.2.2 rlang_0.3.1
# [9] dplyr_0.8.0.1 stringr_1.3.1 plyr_1.8.4 tools_3.4.3
# [13] grid_3.4.3 gtable_0.2.0 withr_2.1.2 htmltools_0.3.6
# [17] assertthat_0.2.0 yaml_2.2.0 lazyeval_0.2.1 rprojroot_1.3-2
# [21] digest_0.6.17 tibble_2.1.1 crayon_1.3.4 purrr_0.2.5
# [25] glue_1.3.0 evaluate_0.11 labeling_0.3 stringi_1.2.4
# [29] compiler_3.4.3 pillar_1.3.1 scales_0.5.0 backports_1.1.2
# [33] pkgconfig_2.0.2