Last updated: 2018-07-27
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date 
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
 ✔ Environment: empty 
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
 ✔ Seed: 
set.seed(20180714) 
The command set.seed(20180714) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
 ✔ Session information: recorded 
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
 ✔ Repository version: f497f36 
wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    docs/.DS_Store
    Ignored:    docs/figure/.DS_Store
Untracked files:
    Untracked:  data/greedy19.rds
Here, I return to a question I asked in a previous investigation: does the choice of init_fn affect the final objective attained?
I perform a very simple (if time-consuming) experiment. I fit FLASH to the GTEx dataset from previous investigations using init_fn = udv_si with ten different seeds, and compare the final objective in each case to the objective I get using init_fn = udv_svd.
I pre-run the code and load the results from file. I am using warmstarts to avoid the problem described here (at present, this functionality is only available in branch trackObj).
# devtools::install_github("stephenslab/flashr", ref="trackObj")
devtools::load_all("/Users/willwerscheid/GitHub/flashr")
# devtools::install_github("stephenslab/ebnm")
devtools::load_all("/Users/willwerscheid/GitHub/ebnm")
gtex <- readRDS(gzcon(url("https://github.com/stephenslab/gtexresults/blob/master/data/MatrixEQTLSumStats.Portable.Z.rds?raw=TRUE")))
strong <- t(gtex$strong.z)
niter <- 10
obj.udv_si <- rep(0, niter)
nfactor.udv_si <- rep(0, niter)
for (i in 1:niter) {
  res <- flash_add_greedy(strong, Kmax=50, init_fn="udv_si",
                                warmstart=TRUE, verbose=TRUE, seed=i)
  obj.udv_si[i] <- flash_get_objective(strong, res$f)
  nfactor.udv_si[i] <- flash_get_nfactors(res$f)
}
res2 <- flash_add_greedy(strong, Kmax=50, init_fn="udv_svd",
                                warmstart=TRUE, verbose=TRUE, seed=i)
obj.udv_svd <- flash_get_objective(strong, res2$f)
nfactor.udv_svd <- flash_get_nfactors(res2$f)
all_res <- list(obj.udv_si = obj.udv_si,
                obj.udv_svd = obj.udv_svd,
                nfactor.udv_si = nfactor.udv_si)
                nfactor.udv_svd = nfactor.udv_svd)
saveRDS(all_res, "../data/init_fn2/res.rds")all_res <- readRDS("./data/init_fn2/res.rds")Results are as follows.
obj.diff <- all_res$obj.udv_si - all_res$obj.udv_svd
col <- c("lightgreen", "skyblue", "royalblue", "purple4")
plot.col <- col[all_res$nfactor.udv_si - 22]
plot(1:10, obj.diff, 
     xlab="seed", ylab="difference in objective", 
     xlim=c(1, 11), ylim=c(-1200, 200),
     pch=19, col=plot.col,
     main="Obj. attained for udv_si (relative to udv_svd)")
abline(0, 0, lty=2)
legend("topright", as.character(23:26), pch=19, col=col,
       title="# factors")
| Version | Author | Date | 
|---|---|---|
| 974e1db | Jason Willwerscheid | 2018-07-27 | 
So, six seeds yield an objective that is much worse than the objective attained using udv_svd, and four seeds yield an objective that is as good or slightly better. The latter all include 25 factor/loading pairs, which is also the number of factor/loading pairs given by udv_svd.
These results lend further support to my previous conclusion that we should make udv_svd the default initialization function when there is no missing data.
sessionInfo()R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
loaded via a namespace (and not attached):
 [1] workflowr_1.0.1   Rcpp_0.12.17      digest_0.6.15    
 [4] rprojroot_1.3-2   R.methodsS3_1.7.1 backports_1.1.2  
 [7] git2r_0.21.0      magrittr_1.5      evaluate_0.10.1  
[10] stringi_1.1.6     whisker_0.3-2     R.oo_1.21.0      
[13] R.utils_2.6.0     rmarkdown_1.8     tools_3.4.3      
[16] stringr_1.3.0     yaml_2.1.17       compiler_3.4.3   
[19] htmltools_0.3.6   knitr_1.20       This reproducible R Markdown analysis was created with workflowr 1.0.1