Flashier features

Last updated: 2019-07-02

Checks: 6 0

Knit directory: FLASHvestigations/

This reproducible R Markdown analysis was created with workflowr (version 1.2.0). The Report tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20180714)

The command set.seed(20180714) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Repository version: 52e6cfe

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    code/.DS_Store
    Ignored:    code/flashier_bench/.DS_Store
    Ignored:    data/flashier_bench/

Untracked files:
    Untracked:  code/ebspca.R
    Untracked:  code/flashier_bench/.test2.swp
    Untracked:  code/lowrank.R

Unstaged changes:
    Modified:   analysis/index.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File	Version	Author	Date	Message
Rmd	52e6cfe	Jason Willwerscheid	2019-07-02	wflow_publish(“analysis/flashier_features.Rmd”)
html	9ecf083	Jason Willwerscheid	2019-01-15	Build site.
Rmd	b6e2dc2	Jason Willwerscheid	2019-01-15	workflowr::wflow_publish(“analysis/flashier_features.Rmd”)
html	7429ab8	Jason Willwerscheid	2019-01-12	Build site.
Rmd	429cae7	Jason Willwerscheid	2019-01-12	workflowr::wflow_publish(“analysis/flashier_features.Rmd”)

Handles sparse matrices (of class Matrix), tensors (3-dimensional arrays), and low-rank matrix representations (as returned by, for example, svd, irlba, rsvd, and softImpute).
Implements a full range of variance structures, including “kronecker” and “noisy.” In general, the estimated residual variance can be an arbitrary rank-one matrix or tensor.
For simple variance structures (including “constant” and “by row”/“by column”), no \(n \times p\) matrix is ever formed (so, for example, a matrix of residuals is never explicitly formed). This yields a large improvement in memory usage and runtime for very large data matrices. (Benchmarking results are here.)
Speeds up backfits by dropping factors once they are no longer improving the objective very much (so, instead of updating every factor each iteration, only factors that are still changing are updated). Other backfit options are also available. The “montaigne” option always goes after the factor that most recently produced the largest improvement. While it produces much rougher fits, it can greatly reduce the number of backfit iterations.
Simplifies the user interface. Everything is done via a single function with a small number of parameters, and the latter are more user-friendly. In particular, a new prior.family parameter replaces the less friendly ebnm.fn and ebnm.param.
In constrast, the “workhorse” function gives many more options. One that I especially like allows the user to write an arbitrary function whose output will be displayed during optimization (allowing the user to inspect the progress of optimization however they like).
Uses a home-grown initialization function rather than softImpute. The new function is much faster than softImpute for large matrices (see the benchmarking results).
Instead of sampling the full \(LF'\) matrix, the sampler now just samples \(L\) and \(F\) separately. This reduces memory usage by a factor of \(\min(n, p)\). (With large data matrices, the flashr sampler is basically useless because every sample takes up as much memory as the data matrix itself.)

sessionInfo()

R version 3.5.3 (2019-03-11)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.5

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] workflowr_1.2.0 Rcpp_1.0.1      digest_0.6.18   rprojroot_1.3-2
 [5] backports_1.1.3 git2r_0.25.2    magrittr_1.5    evaluate_0.13  
 [9] stringi_1.4.3   fs_1.2.7        whisker_0.3-2   rmarkdown_1.12 
[13] tools_3.5.3     stringr_1.4.0   glue_1.3.1      xfun_0.6       
[17] yaml_2.2.0      compiler_3.5.3  htmltools_0.3.6 knitr_1.22

Flashier features

Jason Willwerscheid

1/12/2019