Last updated: 2020-11-07

Checks: 2 0

Knit directory: FLASHvestigations/

This reproducible R Markdown analysis was created with workflowr (version 1.2.0). The Report tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    code/.DS_Store
    Ignored:    code/flashier_bench/.DS_Store
    Ignored:    data/.DS_Store
    Ignored:    data/flashier_bench/
    Ignored:    data/metabo3_gwas_mats.RDS
    Ignored:    output/jean/

Untracked files:
    Untracked:  analysis/ashr_grid_refinement.Rmd
    Untracked:  analysis/batting_order.Rmd
    Untracked:  ashr_grid.R
    Untracked:  code/fasfunction.R
    Untracked:  code/nnmf.R
    Untracked:  code/wals.R
    Untracked:  data/BR_teams_2019.csv
    Untracked:  data/FG_teams_2019.csv
    Untracked:  data/batting_order.rds
    Untracked:  data/cole.rds
    Untracked:  data/odorizzi.rds
    Untracked:  data/oll_budget.txt
    Untracked:  data/oll_standings.tsv
    Untracked:  data/pitcher.rds
    Untracked:  data/pitcher2.rds
    Untracked:  data/pitcher_all.rds
    Untracked:  mlb2.R
    Untracked:  mlb_standings.txt
    Untracked:  ottoneu.R
    Untracked:  output/ashr_grid/
    Untracked:  phoible.R

Unstaged changes:
    Modified:   analysis/ebnm_npmle3.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd 6befda1 Jason Willwerscheid 2020-11-07 wflow_publish(“analysis/index.Rmd”)
html 2a53750 Jason Willwerscheid 2020-04-29 Build site.
Rmd 6b7a773 Jason Willwerscheid 2020-04-29 wflow_publish(“analysis/index.Rmd”)
html 91d648d Jason Willwerscheid 2019-11-20 Build site.
Rmd 4d9d7ed Jason Willwerscheid 2019-11-20 workflowr::wflow_publish(“analysis/index.Rmd”)
html d8f07e0 Jason Willwerscheid 2019-09-04 Build site.
Rmd 3670881 Jason Willwerscheid 2019-09-04 wflow_publish(“analysis/index.Rmd”)
html 8bb888f Jason Willwerscheid 2019-08-29 Build site.
Rmd e131938 Jason Willwerscheid 2019-08-29 wflow_publish(“analysis/index.Rmd”)
html b4333e6 Jason Willwerscheid 2019-07-23 Build site.
Rmd 21d2aa6 Jason Willwerscheid 2019-07-23 wflow_publish(“analysis/index.Rmd”)
html b80e139 Jason Willwerscheid 2019-07-15 Build site.
Rmd f41500c Jason Willwerscheid 2019-07-15 wflow_publish(“analysis/index.Rmd”)
html d607a5d Jason Willwerscheid 2019-07-15 Build site.
Rmd 0a61f3c Jason Willwerscheid 2019-07-15 wflow_publish(“analysis/index.Rmd”)
html 4f1396a Jason Willwerscheid 2019-07-15 Build site.
html 3c68fec Jason Willwerscheid 2019-07-15 Build site.
Rmd 9076e41 Jason Willwerscheid 2019-07-15 get organized
html 4efc666 Jason Willwerscheid 2019-07-12 Build site.
Rmd 926cc1c Jason Willwerscheid 2019-07-12 analysis/squarem_notes.Rmd
html f9a3167 Jason Willwerscheid 2019-07-08 Build site.
Rmd 239862c Jason Willwerscheid 2019-07-08 wflow_publish(c(“analysis/parallel_v2.Rmd”, “analysis/index.Rmd”))
html 6772e65 Jason Willwerscheid 2019-07-02 Build site.
Rmd 5d3db89 Jason Willwerscheid 2019-07-02 wflow_publish(“analysis/index.Rmd”)
html d2b435e Jason Willwerscheid 2019-05-19 Build site.
Rmd cdfcea9 Jason Willwerscheid 2019-05-19 wflow_publish(“analysis/index.Rmd”)
html 872fae5 Jason Willwerscheid 2019-02-26 Build site.
Rmd 314e7ea Jason Willwerscheid 2019-02-26 wflow_publish(“analysis/index.Rmd”)
html 6a701fa Jason Willwerscheid 2019-02-25 Build site.
Rmd 3c6502f Jason Willwerscheid 2019-02-25 wflow_publish(“analysis/index.Rmd”)
html 4693348 Jason Willwerscheid 2019-02-19 Build site.
Rmd 223898e Jason Willwerscheid 2019-02-19 wflow_publish(“analysis/index.Rmd”)
html 57be68c Jason Willwerscheid 2019-02-13 Build site.
Rmd 0b17dcf Jason Willwerscheid 2019-02-13 wflow_publish(“analysis/index.Rmd”)
html ba4e575 Jason Willwerscheid 2019-01-28 Build site.
Rmd d5bdcbc Jason Willwerscheid 2019-01-28 wflow_publish(“analysis/index.Rmd”)
html 8d151ff Jason Willwerscheid 2019-01-15 Build site.
Rmd b421808 Jason Willwerscheid 2019-01-15 workflowr::wflow_publish(“analysis/index.Rmd”)
html a276f8c Jason Willwerscheid 2018-12-06 Build site.
Rmd 54fd17d Jason Willwerscheid 2018-12-06 wflow_publish(“analysis/index.Rmd”)
html fa6eeb2 Jason Willwerscheid 2018-12-05 Build site.
Rmd 403051a Jason Willwerscheid 2018-12-05 wflow_publish(“analysis/index.Rmd”)
html 7aba87b Jason Willwerscheid 2018-12-04 Build site.
Rmd 6f64dea Jason Willwerscheid 2018-12-04 wflow_publish(“analysis/index.Rmd”)
html a0e8cba Jason Willwerscheid 2018-12-04 Build site.
Rmd e33776b Jason Willwerscheid 2018-12-04 workflowr::wflow_publish(“analysis/index.Rmd”)
html c013c93 Jason Willwerscheid 2018-11-06 Build site.
Rmd b93993d Jason Willwerscheid 2018-11-06 wflow_publish(c(“analysis/index.Rmd”))
html 5095faf Jason Willwerscheid 2018-11-06 Build site.
Rmd 8b95d79 Jason Willwerscheid 2018-11-06 wflow_publish(c(“analysis/index.Rmd”))
html 7ca650c Jason Willwerscheid 2018-11-06 Build site.
Rmd 83c6dd5 Jason Willwerscheid 2018-11-06 wflow_publish(c(“analysis/matrix_ops.Rmd”, “analysis/index.Rmd”))
html 195021c Jason Willwerscheid 2018-11-05 Build site.
Rmd 3d91e0d Jason Willwerscheid 2018-11-05 wflow_publish(“analysis/index.Rmd”)
html 5f7219f Jason Willwerscheid 2018-10-23 Build site.
Rmd a57f09e Jason Willwerscheid 2018-10-23 workflowr::wflow_publish(“analysis/index.Rmd”)
html 0201367 Jason Willwerscheid 2018-10-02 Build site.
Rmd 6e4d240 Jason Willwerscheid 2018-10-02 wflow_publish(“analysis/index.Rmd”)
html 1c6fd5c Jason Willwerscheid 2018-09-28 Build site.
Rmd 43dbd89 Jason Willwerscheid 2018-09-28 wflow_publish(“analysis/index.Rmd”)
html 16d164d Jason Willwerscheid 2018-09-25 Build site.
Rmd c416b56 Jason Willwerscheid 2018-09-25 wflow_publish(“analysis/index.Rmd”)
html 0d3e65d Jason Willwerscheid 2018-09-21 Build site.
Rmd 382f0f7 Jason Willwerscheid 2018-09-21 wflow_publish(“analysis/index.Rmd”)
html 23eb8cb Jason Willwerscheid 2018-09-20 Build site.
Rmd 73535b4 Jason Willwerscheid 2018-09-20 workflowr::wflow_publish(“analysis/index.Rmd”)
html 61c0e2e Jason Willwerscheid 2018-09-18 Build site.
Rmd fc79d31 Jason Willwerscheid 2018-09-18 wflow_publish(“analysis/index.Rmd”)
html dc193d9 Jason Willwerscheid 2018-09-12 Build site.
Rmd 5e96e40 Jason Willwerscheid 2018-09-12 wflow_publish(“analysis/index.Rmd”)
html efbae90 Jason Willwerscheid 2018-09-11 Build site.
Rmd 734b172 Jason Willwerscheid 2018-09-11 wflow_publish(“analysis/index.Rmd”)
html eb78de4 Jason Willwerscheid 2018-08-29 Build site.
Rmd 9302ecc Jason Willwerscheid 2018-08-29 wflow_publish(“analysis/index.Rmd”)
html 10a535b Jason Willwerscheid 2018-08-29 Build site.
Rmd 47f0b23 Jason Willwerscheid 2018-08-29 wflow_publish(“analysis/index.Rmd”)
html 07ffbc2 Jason Willwerscheid 2018-08-23 Build site.
Rmd c73018d Jason Willwerscheid 2018-08-23 wflow_publish(“analysis/index.Rmd”)
html 9e39a5f Jason Willwerscheid 2018-08-22 Build site.
Rmd ac09d4b Jason Willwerscheid 2018-08-22 wflow_publish(“analysis/index.Rmd”)
html b0fbbe0 Jason Willwerscheid 2018-08-22 Build site.
Rmd 4fe662c Jason Willwerscheid 2018-08-22 wflow_publish(“analysis/index.Rmd”)
html 02be9e9 Jason Willwerscheid 2018-08-17 Build site.
Rmd 6e0036f Jason Willwerscheid 2018-08-17 wflow_publish(“analysis/index.Rmd”)
html a6d55b6 Jason Willwerscheid 2018-08-16 Build site.
Rmd c972927 Jason Willwerscheid 2018-08-16 wflow_publish(c(“analysis/index.Rmd”, “analysis/parallel2.Rmd”))
html 10d46c0 Jason Willwerscheid 2018-08-15 Build site.
Rmd acd7f0c Jason Willwerscheid 2018-08-15 wflow_publish(c(“analysis/index.Rmd”))
html 86ef31a Jason Willwerscheid 2018-08-12 Build site.
Rmd 73aa339 Jason Willwerscheid 2018-08-12 wflow_publish(“analysis/index.Rmd”)
html 5a2e2a9 Jason Willwerscheid 2018-08-12 Build site.
Rmd 6a9bf1c Jason Willwerscheid 2018-08-12 wflow_publish(c(“analysis/index.Rmd”, “analysis/parallel.Rmd”))
html 9246c68 Jason Willwerscheid 2018-08-12 Build site.
Rmd eaba09b Jason Willwerscheid 2018-08-12 wflow_publish(c(“analysis/index.Rmd”, “analysis/parallel.Rmd”))
html cfa8e83 Jason Willwerscheid 2018-07-27 Build site.
Rmd f497f36 Jason Willwerscheid 2018-07-27 wflow_publish(c(“analysis/warmstart2.Rmd”,
html 974e1db Jason Willwerscheid 2018-07-27 Build site.
Rmd 2d81ef4 Jason Willwerscheid 2018-07-27 wflow_publish(c(“analysis/index.Rmd”, “analysis/init_fn2.Rmd”))
html 9fc59e7 Jason Willwerscheid 2018-07-26 Build site.
Rmd 21cbf89 Jason Willwerscheid 2018-07-26 wflow_publish(c(“analysis/index.Rmd”))
html 8f471cb Jason Willwerscheid 2018-07-26 Build site.
Rmd ff417b6 Jason Willwerscheid 2018-07-26 wflow_publish(c(“analysis/warmstart.Rmd”, “analysis/index.Rmd”))
html aae4dfc Jason Willwerscheid 2018-07-25 Build site.
Rmd 6417d4e Jason Willwerscheid 2018-07-25 wflow_publish(c(“analysis/index.Rmd”, “analysis/init_fn.Rmd”))
html 3983695 Jason Willwerscheid 2018-07-25 Build site.
Rmd 566c202 Jason Willwerscheid 2018-07-25 wflow_publish(“analysis/index.Rmd”)
html dd91b19 Jason Willwerscheid 2018-07-20 Build site.
Rmd 032edd6 Jason Willwerscheid 2018-07-20 wflow_publish(c(“analysis/index.Rmd”, “analysis/flash_em.Rmd”,
html f995dbb Jason Willwerscheid 2018-07-20 manual commits to remove licence
html c9f10b0 Jason Willwerscheid 2018-07-20 Build site.
Rmd 4fc94bd Jason Willwerscheid 2018-07-20 wflow_publish(c(“analysis/index.Rmd”, “analysis/flash_em.Rmd”))
html 1de2af0 Jason Willwerscheid 2018-07-20 Build site.
Rmd 9ac7bb0 Jason Willwerscheid 2018-07-20 wflow_publish(“analysis/index.Rmd”)
html f98f61f Jason Willwerscheid 2018-07-20 Build site.
Rmd 8e383cd Jason Willwerscheid 2018-07-20 wflow_publish(c(“analysis/index.Rmd”))
html 58b3ea4 Jason Willwerscheid 2018-07-19 Build site.
Rmd eb264b1 Jason Willwerscheid 2018-07-19 wflow_publish(“analysis/index.Rmd”)
html 36a6312 Jason Willwerscheid 2018-07-19 Build site.
Rmd a9f8cc6 Jason Willwerscheid 2018-07-19 wflow_publish(“analysis/index.Rmd”)
html c420802 Jason Willwerscheid 2018-07-17 Build site.
Rmd 38942eb Jason Willwerscheid 2018-07-17 wflow_publish(“analysis/index.Rmd”)
html e2e1286 Jason Willwerscheid 2018-07-16 Build site.
Rmd 55f42ca Jason Willwerscheid 2018-07-16 wflow_publish(“analysis/index.Rmd”)
html de8f823 Jason Willwerscheid 2018-07-16 Build site.
Rmd f5379fa Jason Willwerscheid 2018-07-16 wflow_publish(“analysis/index.Rmd”)
html 7db12a1 Jason Willwerscheid 2018-07-16 Build site.
html 873da40 Jason Willwerscheid 2018-07-16 Build site.
html 58e136a Jason Willwerscheid 2018-07-15 Build site.
Rmd 82c7cdd Jason Willwerscheid 2018-07-15 wflow_publish(c(“analysis/index.Rmd”, “analysis/about.Rmd”))
html 76eacb7 Jason Willwerscheid 2018-07-15 Build site.
Rmd f6928cc Jason Willwerscheid 2018-07-15 wflow_publish(c(“analysis/index.Rmd”, “analysis/about.Rmd”))
Rmd 56d7ded Jason Willwerscheid 2018-07-14 Start workflowr project.

Flashiest

New features included in the flashier implementation of EBMF.

Investigation 21. Benchmarking flashier.

Investigation 27. Using low-rank approximations to the data as input to flashier.

Investigation 28. Performing backfitting updates in parallel.

Investigation 29. Accelerating backfits via extrapolation.

Under Construction

Investigation 18 and accompanying notes explore stochastic approaches to fitting FLASH objects to very large datasets.

  • Notes on fitting FLASH models when \(n\) is manageable and \(p\) is very large.

  • Investigation 18. I implement the idea described in the previous notes and I test it out on data from the GTEx project.

Investigation 22 was meant as an illustration of how to apply flashier to tensor data, but ended up highlighting some interesting issues that arise with missing data.

Investigation 30. A comparison of different variance structures and initialization methods using a GWAS dataset collated by Jean Morrison.

Investigation 31. A simulation study examining the effect of non-Gaussian noise on the number of factors selected by flash.

Investigation 32. Approximating the NPMLE using ebnm.

Investigation 32. Approximating the NPMLE using ebnm.

Reference Material

Investigation 4 and accompanying notes describe a way to compute the FLASH objective directly (rather than using the indirect method implemented in flashr).

  • Notes on computing the FLASH objective function. I derive an explicit expression for the KL divergence between prior and posterior.

  • Notes towards an alternate algorithm for optimizing the FLASH objective, using the explicit expression derived in the previous notes.

  • Investigation 4. The alternate algorithm agrees with flashr with respect to both the objective and fit obtained.

Abandoned, For Now

I’m no longer pursuing acceleration via SQUAREM/DAAREM. My reasons are detailed in these notes.

  • Investigation 10. SQUAREM does poorly on FLASH backfits. DAAREM (a more recent algorithm by one of the authors of SQUAREM) does better, but offers smaller performance gains than parallelization.

flashier implemented the ability to specify the order in which factors are updated. The order makes very little difference when factor loadings are nearly orthogonal (as they usually are).

  • Investigation 11. The order in which factor/loading pairs are updated (during backfitting) makes some difference, but not much.

We became particularly interested in fitting models with a known error covariance matrix when we were working with GTEx data. The approach is much less promising for the applications I’m currently interested in.

Investigations 14 and 16-17 were some very early experiments with count data. While the first made the usual FLASH assumption that errors are Gaussian, the second and third explicitly attempted to model the entries as count or binary data. So far, the latter approaches haven’t worked terribly well, primarily due to the difficulty of choosing a good point to expand the log likelihood around.

  • Investigation 14. An example of how to use nonnegative ASH priors to obtain a nonnegative matrix factorization.

  • Investigation 16. Instead of directly fitting FLASH, I fit count data via a Gaussian approximation to the Poisson log likelihood…

  • Investigation 17. … then I fit binary data via an approximation to the binomial log likelihood.

Historical Curiosities

An early set of notes identified key ways to reduce the memory footprint of flashr. The good ideas were implemented in flashier. Not all of the ideas were good.

The bug causing the problem described in Investigations 1-3 was fixed in version 0.1-13 of package ebnm.

  • Investigation 1. The FLASH objective function can behave very erratically.

  • Investigation 2. This problem only occurs when using ebnm_pn, not ebnm_ash.

  • Investigation 3. The objective can continue to get worse as loadings are repeatedly updated. Nonetheless, convergence takes place (from above!).

Since flashier uses a home-grown initialization function, Investigations 5a-b and 13 are no longer relevant.

  • Investigation 5a. An argument for changing the default init_fn to udv_si_svd when there is missing data and udv_svd otherwise. Based on an analysis of GTEx data.

  • Investigation 5b. More evidence supporting the recommendations in Investigation 5a.

  • Investigation 13. A counterargument. Results in Investigations 5a-b probably depend on the fact that \(n\) is small (\(n = 44\)). For large \(n\), setting init_fn to udv_si is best.

Investigations 6 and 7 dealt with warmstarts. ashr now uses mixsqp rather than Rmosek. ebnm now calls nlm rather than optim.

  • Investigation 6. Poor optim results can produce large decreases in the objective function. We should use warmstarts when ebnm_fn = ebnm_pn.

  • Investigation 7. The advantages of warmstarts are not nearly as compelling when ebnm_fn = ebnm_ash.

Investigations 8 and 9 were concerned with parallel backfitting updates, which are better covered by Investigation 28.

The changes tested here were implemented in version 0.6-2 of flashr.

The changes tested here were implemented in version 2.2-29 of ashr.

  • Investigation 23. I benchmark the rewritten my_etruncnorm and my_vtruncnorm functions in package ashr against their counterparts in package truncnorm.

Investigations 19a-b, 20, and 24-26 were early experiments on large single-cell RNA datasets. I’ve since created a repository dedicated to studying applications of FLASH to scRNA data.

  • Investigation 19a. An analysis of the smaller “droplet” dataset from Montoro et al.

  • Investigation 19b. I redo my analysis of the “droplet” dataset, but this time I follow the authors’ preprocessing steps. Results are, I think, of much lower quality.

  • Investigation 20. An analysis of the larger “pulse-seq” dataset from Montoro et al.

  • Investigation 24. I propose a new approach to factorizing count data that uses adaptive shrinkage to estimate the rate matrix.

  • Investigation 25. I compare three different data transformations, three approaches to handling the heteroskedacity of the log1p transformation, and two approaches to dealing with row- and column-specific scaling.

  • Investigation 26. I compare FLASH fits of the “droplet” dataset in Montoro et al. using three different data transformations.