Last updated: 2019-07-15
Checks: 2 0
Knit directory: FLASHvestigations/
This reproducible R Markdown analysis was created with workflowr (version 1.2.0). The Report tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: code/.DS_Store
Ignored: code/flashier_bench/.DS_Store
Ignored: data/flashier_bench/
Untracked files:
Untracked: code/ebspca.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view them.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 9076e41 | Jason Willwerscheid | 2019-07-15 | get organized |
html | 4efc666 | Jason Willwerscheid | 2019-07-12 | Build site. |
Rmd | 926cc1c | Jason Willwerscheid | 2019-07-12 | analysis/squarem_notes.Rmd |
html | f9a3167 | Jason Willwerscheid | 2019-07-08 | Build site. |
Rmd | 239862c | Jason Willwerscheid | 2019-07-08 | wflow_publish(c(“analysis/parallel_v2.Rmd”, “analysis/index.Rmd”)) |
html | 6772e65 | Jason Willwerscheid | 2019-07-02 | Build site. |
Rmd | 5d3db89 | Jason Willwerscheid | 2019-07-02 | wflow_publish(“analysis/index.Rmd”) |
html | d2b435e | Jason Willwerscheid | 2019-05-19 | Build site. |
Rmd | cdfcea9 | Jason Willwerscheid | 2019-05-19 | wflow_publish(“analysis/index.Rmd”) |
html | 872fae5 | Jason Willwerscheid | 2019-02-26 | Build site. |
Rmd | 314e7ea | Jason Willwerscheid | 2019-02-26 | wflow_publish(“analysis/index.Rmd”) |
html | 6a701fa | Jason Willwerscheid | 2019-02-25 | Build site. |
Rmd | 3c6502f | Jason Willwerscheid | 2019-02-25 | wflow_publish(“analysis/index.Rmd”) |
html | 4693348 | Jason Willwerscheid | 2019-02-19 | Build site. |
Rmd | 223898e | Jason Willwerscheid | 2019-02-19 | wflow_publish(“analysis/index.Rmd”) |
html | 57be68c | Jason Willwerscheid | 2019-02-13 | Build site. |
Rmd | 0b17dcf | Jason Willwerscheid | 2019-02-13 | wflow_publish(“analysis/index.Rmd”) |
html | ba4e575 | Jason Willwerscheid | 2019-01-28 | Build site. |
Rmd | d5bdcbc | Jason Willwerscheid | 2019-01-28 | wflow_publish(“analysis/index.Rmd”) |
html | 8d151ff | Jason Willwerscheid | 2019-01-15 | Build site. |
Rmd | b421808 | Jason Willwerscheid | 2019-01-15 | workflowr::wflow_publish(“analysis/index.Rmd”) |
html | a276f8c | Jason Willwerscheid | 2018-12-06 | Build site. |
Rmd | 54fd17d | Jason Willwerscheid | 2018-12-06 | wflow_publish(“analysis/index.Rmd”) |
html | fa6eeb2 | Jason Willwerscheid | 2018-12-05 | Build site. |
Rmd | 403051a | Jason Willwerscheid | 2018-12-05 | wflow_publish(“analysis/index.Rmd”) |
html | 7aba87b | Jason Willwerscheid | 2018-12-04 | Build site. |
Rmd | 6f64dea | Jason Willwerscheid | 2018-12-04 | wflow_publish(“analysis/index.Rmd”) |
html | a0e8cba | Jason Willwerscheid | 2018-12-04 | Build site. |
Rmd | e33776b | Jason Willwerscheid | 2018-12-04 | workflowr::wflow_publish(“analysis/index.Rmd”) |
html | c013c93 | Jason Willwerscheid | 2018-11-06 | Build site. |
Rmd | b93993d | Jason Willwerscheid | 2018-11-06 | wflow_publish(c(“analysis/index.Rmd”)) |
html | 5095faf | Jason Willwerscheid | 2018-11-06 | Build site. |
Rmd | 8b95d79 | Jason Willwerscheid | 2018-11-06 | wflow_publish(c(“analysis/index.Rmd”)) |
html | 7ca650c | Jason Willwerscheid | 2018-11-06 | Build site. |
Rmd | 83c6dd5 | Jason Willwerscheid | 2018-11-06 | wflow_publish(c(“analysis/matrix_ops.Rmd”, “analysis/index.Rmd”)) |
html | 195021c | Jason Willwerscheid | 2018-11-05 | Build site. |
Rmd | 3d91e0d | Jason Willwerscheid | 2018-11-05 | wflow_publish(“analysis/index.Rmd”) |
html | 5f7219f | Jason Willwerscheid | 2018-10-23 | Build site. |
Rmd | a57f09e | Jason Willwerscheid | 2018-10-23 | workflowr::wflow_publish(“analysis/index.Rmd”) |
html | 0201367 | Jason Willwerscheid | 2018-10-02 | Build site. |
Rmd | 6e4d240 | Jason Willwerscheid | 2018-10-02 | wflow_publish(“analysis/index.Rmd”) |
html | 1c6fd5c | Jason Willwerscheid | 2018-09-28 | Build site. |
Rmd | 43dbd89 | Jason Willwerscheid | 2018-09-28 | wflow_publish(“analysis/index.Rmd”) |
html | 16d164d | Jason Willwerscheid | 2018-09-25 | Build site. |
Rmd | c416b56 | Jason Willwerscheid | 2018-09-25 | wflow_publish(“analysis/index.Rmd”) |
html | 0d3e65d | Jason Willwerscheid | 2018-09-21 | Build site. |
Rmd | 382f0f7 | Jason Willwerscheid | 2018-09-21 | wflow_publish(“analysis/index.Rmd”) |
html | 23eb8cb | Jason Willwerscheid | 2018-09-20 | Build site. |
Rmd | 73535b4 | Jason Willwerscheid | 2018-09-20 | workflowr::wflow_publish(“analysis/index.Rmd”) |
html | 61c0e2e | Jason Willwerscheid | 2018-09-18 | Build site. |
Rmd | fc79d31 | Jason Willwerscheid | 2018-09-18 | wflow_publish(“analysis/index.Rmd”) |
html | dc193d9 | Jason Willwerscheid | 2018-09-12 | Build site. |
Rmd | 5e96e40 | Jason Willwerscheid | 2018-09-12 | wflow_publish(“analysis/index.Rmd”) |
html | efbae90 | Jason Willwerscheid | 2018-09-11 | Build site. |
Rmd | 734b172 | Jason Willwerscheid | 2018-09-11 | wflow_publish(“analysis/index.Rmd”) |
html | eb78de4 | Jason Willwerscheid | 2018-08-29 | Build site. |
Rmd | 9302ecc | Jason Willwerscheid | 2018-08-29 | wflow_publish(“analysis/index.Rmd”) |
html | 10a535b | Jason Willwerscheid | 2018-08-29 | Build site. |
Rmd | 47f0b23 | Jason Willwerscheid | 2018-08-29 | wflow_publish(“analysis/index.Rmd”) |
html | 07ffbc2 | Jason Willwerscheid | 2018-08-23 | Build site. |
Rmd | c73018d | Jason Willwerscheid | 2018-08-23 | wflow_publish(“analysis/index.Rmd”) |
html | 9e39a5f | Jason Willwerscheid | 2018-08-22 | Build site. |
Rmd | ac09d4b | Jason Willwerscheid | 2018-08-22 | wflow_publish(“analysis/index.Rmd”) |
html | b0fbbe0 | Jason Willwerscheid | 2018-08-22 | Build site. |
Rmd | 4fe662c | Jason Willwerscheid | 2018-08-22 | wflow_publish(“analysis/index.Rmd”) |
html | 02be9e9 | Jason Willwerscheid | 2018-08-17 | Build site. |
Rmd | 6e0036f | Jason Willwerscheid | 2018-08-17 | wflow_publish(“analysis/index.Rmd”) |
html | a6d55b6 | Jason Willwerscheid | 2018-08-16 | Build site. |
Rmd | c972927 | Jason Willwerscheid | 2018-08-16 | wflow_publish(c(“analysis/index.Rmd”, “analysis/parallel2.Rmd”)) |
html | 10d46c0 | Jason Willwerscheid | 2018-08-15 | Build site. |
Rmd | acd7f0c | Jason Willwerscheid | 2018-08-15 | wflow_publish(c(“analysis/index.Rmd”)) |
html | 86ef31a | Jason Willwerscheid | 2018-08-12 | Build site. |
Rmd | 73aa339 | Jason Willwerscheid | 2018-08-12 | wflow_publish(“analysis/index.Rmd”) |
html | 5a2e2a9 | Jason Willwerscheid | 2018-08-12 | Build site. |
Rmd | 6a9bf1c | Jason Willwerscheid | 2018-08-12 | wflow_publish(c(“analysis/index.Rmd”, “analysis/parallel.Rmd”)) |
html | 9246c68 | Jason Willwerscheid | 2018-08-12 | Build site. |
Rmd | eaba09b | Jason Willwerscheid | 2018-08-12 | wflow_publish(c(“analysis/index.Rmd”, “analysis/parallel.Rmd”)) |
html | cfa8e83 | Jason Willwerscheid | 2018-07-27 | Build site. |
Rmd | f497f36 | Jason Willwerscheid | 2018-07-27 | wflow_publish(c(“analysis/warmstart2.Rmd”, |
html | 974e1db | Jason Willwerscheid | 2018-07-27 | Build site. |
Rmd | 2d81ef4 | Jason Willwerscheid | 2018-07-27 | wflow_publish(c(“analysis/index.Rmd”, “analysis/init_fn2.Rmd”)) |
html | 9fc59e7 | Jason Willwerscheid | 2018-07-26 | Build site. |
Rmd | 21cbf89 | Jason Willwerscheid | 2018-07-26 | wflow_publish(c(“analysis/index.Rmd”)) |
html | 8f471cb | Jason Willwerscheid | 2018-07-26 | Build site. |
Rmd | ff417b6 | Jason Willwerscheid | 2018-07-26 | wflow_publish(c(“analysis/warmstart.Rmd”, “analysis/index.Rmd”)) |
html | aae4dfc | Jason Willwerscheid | 2018-07-25 | Build site. |
Rmd | 6417d4e | Jason Willwerscheid | 2018-07-25 | wflow_publish(c(“analysis/index.Rmd”, “analysis/init_fn.Rmd”)) |
html | 3983695 | Jason Willwerscheid | 2018-07-25 | Build site. |
Rmd | 566c202 | Jason Willwerscheid | 2018-07-25 | wflow_publish(“analysis/index.Rmd”) |
html | dd91b19 | Jason Willwerscheid | 2018-07-20 | Build site. |
Rmd | 032edd6 | Jason Willwerscheid | 2018-07-20 | wflow_publish(c(“analysis/index.Rmd”, “analysis/flash_em.Rmd”, |
html | f995dbb | Jason Willwerscheid | 2018-07-20 | manual commits to remove licence |
html | c9f10b0 | Jason Willwerscheid | 2018-07-20 | Build site. |
Rmd | 4fc94bd | Jason Willwerscheid | 2018-07-20 | wflow_publish(c(“analysis/index.Rmd”, “analysis/flash_em.Rmd”)) |
html | 1de2af0 | Jason Willwerscheid | 2018-07-20 | Build site. |
Rmd | 9ac7bb0 | Jason Willwerscheid | 2018-07-20 | wflow_publish(“analysis/index.Rmd”) |
html | f98f61f | Jason Willwerscheid | 2018-07-20 | Build site. |
Rmd | 8e383cd | Jason Willwerscheid | 2018-07-20 | wflow_publish(c(“analysis/index.Rmd”)) |
html | 58b3ea4 | Jason Willwerscheid | 2018-07-19 | Build site. |
Rmd | eb264b1 | Jason Willwerscheid | 2018-07-19 | wflow_publish(“analysis/index.Rmd”) |
html | 36a6312 | Jason Willwerscheid | 2018-07-19 | Build site. |
Rmd | a9f8cc6 | Jason Willwerscheid | 2018-07-19 | wflow_publish(“analysis/index.Rmd”) |
html | c420802 | Jason Willwerscheid | 2018-07-17 | Build site. |
Rmd | 38942eb | Jason Willwerscheid | 2018-07-17 | wflow_publish(“analysis/index.Rmd”) |
html | e2e1286 | Jason Willwerscheid | 2018-07-16 | Build site. |
Rmd | 55f42ca | Jason Willwerscheid | 2018-07-16 | wflow_publish(“analysis/index.Rmd”) |
html | de8f823 | Jason Willwerscheid | 2018-07-16 | Build site. |
Rmd | f5379fa | Jason Willwerscheid | 2018-07-16 | wflow_publish(“analysis/index.Rmd”) |
html | 7db12a1 | Jason Willwerscheid | 2018-07-16 | Build site. |
html | 873da40 | Jason Willwerscheid | 2018-07-16 | Build site. |
html | 58e136a | Jason Willwerscheid | 2018-07-15 | Build site. |
Rmd | 82c7cdd | Jason Willwerscheid | 2018-07-15 | wflow_publish(c(“analysis/index.Rmd”, “analysis/about.Rmd”)) |
html | 76eacb7 | Jason Willwerscheid | 2018-07-15 | Build site. |
Rmd | f6928cc | Jason Willwerscheid | 2018-07-15 | wflow_publish(c(“analysis/index.Rmd”, “analysis/about.Rmd”)) |
Rmd | 56d7ded | Jason Willwerscheid | 2018-07-14 | Start workflowr project. |
New features included in the flashier
implementation of EBMF.
Investigation 21. Benchmarking flashier
.
Investigation 27. Using low-rank approximations to the data as input to flashier
.
Investigation 28. Performing backfitting updates in parallel.
Note 3 and Investigation 18 explore stochastic approaches to fitting FLASH objects to very large datasets.
Note 3. An idea for how to fit FLASH models when \(n\) is manageable and \(p\) is very large.
Investigation 18. I implement the idea described in Note 3 and I test it out on data from the GTEx project.
Investigation 22 was meant as an illustration of how to apply flashier
to tensor data, but ended up highlighting some interesting issues that arise with missing data.
flashier
analysis of the GTEx brain subtensor.Investigation 4 and accompanying notes describe a way to compute the FLASH objective directly (rather than using the indirect method implemented in flashr
).
Notes on computing the FLASH objective function. I derive an explicit expression for the KL divergence between prior and posterior.
Notes for an alternate algorithm for optimizing the FLASH objective, using the explicit expression derived in the previous notes.
Investigation 4. The alternate algorithm agrees with FLASH with respect to both the objective and fit obtained.
I’m no longer pursuing acceleration via SQUAREM/DAAREM. My reasons are detailed in these notes.
flashier
implemented the ability to specify the order in which factors are updated. The order makes very little difference when factor loadings are nearly orthogonal (as they usually are).
We became particularly interested in fitting models with a known error covariance matrix when we were working with GTEx data. The approach is much less promising for the applications I’m currently interested in.
Investigations 14 and 16-17 were some very early experiments with count data. While the first made the usual FLASH assumption that errors are Gaussian, the second and third explicitly attempted to model the entries as count or binary data. So far, the latter approaches haven’t worked terribly well, primarily due to the difficulty of choosing a good point to expand the log likelihood around.
Investigation 14. An example of how to use nonnegative ASH priors to obtain a nonnegative matrix factorization.
Investigation 16. Instead of directly fitting FLASH, I fit count data via a Gaussian approximation to the Poisson log likelihood…
Investigation 17. … then I fit binary data via an approximation to the binomial log likelihood.
An early set of notes identified key ways to reduce the memory footprint of flashr
. The good ideas were implemented in flashier
. Not all of the ideas were good.
The bug causing the problem described in Investigations 1-3 was fixed in version 0.1-13 of package ebnm
.
Investigation 1. The FLASH objective function can behave very erratically.
Investigation 2. This problem only occurs when using ebnm_pn
, not ebnm_ash
.
Investigation 3. The objective can continue to get worse as loadings are repeatedly updated. Nonetheless, convergence takes place (from above!).
Since flashier
uses a home-grown initialization function, Investigations 5a-b and 13 are no longer relevant.
Investigation 5a. An argument for changing the default init_fn
to udv_si_svd
when there is missing data and udv_svd
otherwise. Based on an analysis of GTEx data.
Investigation 5b. More evidence supporting the recommendations in Investigation 5a.
Investigation 13. A counterargument. Results in Investigations 5a-b probably depend on the fact that \(n\) is small (\(n = 44\)). For large \(n\), setting init_fn
to udv_si
is best.
Investigations 6 and 7 dealt with warmstarts. ashr
now uses mixsqp
rather than Rmosek
. ebnm
now calls nlm
rather than optim
.
Investigation 6. Poor optim
results can produce large decreases in the objective function. We should use warmstarts when ebnm_fn = ebnm_pn
.
Investigation 7. The advantages of warmstarts are not nearly as compelling when ebnm_fn = ebnm_ash
.
Investigations 8 and 9 were concerned with parallel backfitting updates, which are better covered by Investigation 28.
Investigation 8. Parallelizing the backfitting algorithm shows promise.
Investigation 9. An additional trick is needed to parallelize the backfitting updates performed in this MASH v FLASH GTEx analysis.
The changes tested here were implemented in version 0.6-2 of flashr
.
tau
is stored, as discussed here.The changes tested here were implemented in version 2.2-29 of ashr
.
my_etruncnorm
and my_vtruncnorm
functions in package ashr
against their counterparts in package truncnorm
.Investigations 19a-b, 20, and 24-26 were early experiments on large single-cell RNA datasets. I’ve since created a repository dedicated to studying applications of EBMF to scRNA data.
Investigation 19a. An analysis of the smaller “droplet” dataset from Montoro et al.
Investigation 19b. I redo my analysis of the “droplet” dataset, but this time I follow the authors’ preprocessing steps. Results are, I think, of much lower quality.
Investigation 20. An analysis of the larger “pulse-seq” dataset from Montoro et al.
Investigation 24. I propose a new approach to factorizing count data that uses adaptive shrinkage to estimate the rate matrix.
Investigation 25. I compare three different data transformations, three approaches to handling the heteroskedacity of the log1p
transformation, and two approaches to dealing with row- and column-specific scaling.
Investigation 26. I compare FLASH fits of the “droplet” dataset in Montoro et al. using three different data transformations.