Flashier features

Last updated: 2019-01-12

workflowr checks: (Click a bullet for more information)

✔ R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed: set.seed(20180714)

The command set.seed(20180714) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: 429cae7
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
```
Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    docs/.DS_Store
    Ignored:    docs/figure/.DS_Store

Untracked files:
    Untracked:  analysis/gd_notes.Rmd

Unstaged changes:
    Modified:   analysis/brain.Rmd
    Modified:   analysis/index.Rmd
```
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

Expand here to see past versions:

File	Version	Author	Date	Message
Rmd	429cae7	Jason Willwerscheid	2019-01-12	workflowr::wflow_publish(“analysis/flashier_features.Rmd”)

Handles sparse matrices (of class Matrix) and tensors (3-dimensional arrays).
Simplifies the user interface. Everything is done via a single function with a small number of parameters, and the latter are more intuitive. In particular, a new prior.type parameter replaces the less friendly ebnm.fn and ebnm.param.
In constrast, the “workhorse” function gives many more options. One that I especially like allows the user to write an arbitrary function whose output will be displayed during optimization (allowing the user to inspect the progress of optimization however they like).
Implements a full range of variance structures, including “kronecker” and “noisy.” In general, the estimated residual variance can be an arbitrary rank-one matrix or tensor.
For simple variance structures (including “constant” and “by row”/“by column”), no \(n \times p\) matrix is ever formed (so, for example, a matrix of residuals is never explicitly formed). This yields a large improvement in memory usage and runtime for very large data matrices. (Benchmarking results are here.)
Uses a home-grown initialization function rather than softImpute. The new function is much faster than softImpute for large matrices and deals with fixed elements in a very natural manner.
Includes new options for speeding up backfits. The “dropout” option drops individual factors once they are no longer improving the objective very much (so, instead of updating every factor each iteration, only factors that are still changing are updated). The “montaigne” option takes this a step further and goes after the factor that most recently produced the largest improvement. This produces much rougher fits, but can greatly reduce the number of backfit iterations.
Instead of sampling the full \(LF'\) matrix, the sampler now just samples \(L\) and \(F\) separately. This reduces memory usage by a factor of \(\min(n, p)\). (With large data matrices, the flashr sampler is basically useless because every sample takes up as much memory as the data matrix itself.)
Includes a new nonmissing.thresh parameter to better deal with missing data. See here for an example.

Session information

sessionInfo()

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] workflowr_1.0.1   Rcpp_1.0.0        digest_0.6.18    
 [4] rprojroot_1.3-2   R.methodsS3_1.7.1 backports_1.1.2  
 [7] magrittr_1.5      git2r_0.21.0      evaluate_0.12    
[10] stringi_1.2.4     whisker_0.3-2     R.oo_1.21.0      
[13] R.utils_2.6.0     rmarkdown_1.10    tools_3.4.3      
[16] stringr_1.3.1     xfun_0.4          yaml_2.2.0       
[19] compiler_3.4.3    htmltools_0.3.6   knitr_1.20.22

This reproducible R Markdown analysis was created with workflowr 1.0.1

Flashier features

Jason Willwerscheid

1/12/2019

Session information