Processing math: 61%
  • Indirect method
  • Direct method

Last updated: 2018-07-20

workflowr checks: (Click a bullet for more information)
  • R Markdown file: up-to-date

    Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

  • Repository version: 5ff70b3

    Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

    Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
    
    Ignored files:
        Ignored:    .DS_Store
        Ignored:    .Rhistory
        Ignored:    .Rproj.user/
        Ignored:    docs/.DS_Store
        Ignored:    docs/figure/.DS_Store
    
    Unstaged changes:
        Deleted:    analysis/license.Rmd
    
    
    Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
Expand here to see past versions:
    File Version Author Date Message
    html 5487b70 Jason Willwerscheid 2018-07-20 Build site.
    Rmd 8911eb8 Jason Willwerscheid 2018-07-20 wflow_publish(c(“analysis/obj_notes.Rmd”,
    html de6736c Jason Willwerscheid 2018-07-18 Build site.
    Rmd 4d6f3da Jason Willwerscheid 2018-07-18 wflow_publish(“analysis/obj_notes.Rmd”)
    html f34a82b Jason Willwerscheid 2018-07-18 Build site.
    Rmd 6c2cefd Jason Willwerscheid 2018-07-18 wflow_publish(“analysis/obj_notes.Rmd”)
    html cc77169 Jason Willwerscheid 2018-07-17 Build site.
    Rmd 87f4d40 Jason Willwerscheid 2018-07-17 wflow_publish(“analysis/obj_notes.Rmd”)
    html e77fc4c Jason Willwerscheid 2018-07-17 Build site.
    Rmd eb42402 Jason Willwerscheid 2018-07-17 wflow_publish(“analysis/obj_notes.Rmd”)
    html 9678634 Jason Willwerscheid 2018-07-17 Build site.
    Rmd 3e874c6 Jason Willwerscheid 2018-07-17 wflow_publish(“analysis/obj_notes.Rmd”)
    html 02765a9 Jason Willwerscheid 2018-07-17 Build site.
    Rmd e811943 Jason Willwerscheid 2018-07-17 wflow_publish(“analysis/obj_notes.Rmd”)
    html 0621185 Jason Willwerscheid 2018-07-17 Build site.
    Rmd a0bccf9 Jason Willwerscheid 2018-07-17 wflow_publish(“analysis/obj_notes.Rmd”)


Indirect method

Recall the FLASH model: Y=LF+E

When updating loading lk, we are optimizing over glk and qlk. glkG is the prior on the elements of the kth column of the loadings matrix: l1k,,lnkiidglk qlk is an arbitrary distribution which enters into the problem via the variational approach. For convenience, I drop the subscripts in the following.

The part of the objective that depends on g and q is F(g,q):=Eq[12i(Ail2i2Bili)]+Eqlogg(l)q(l) with Ai=jτijEf2j and Bi=jτijRijEfj, (R is the matrix of residuals (excluding factor k) and Efj and Ef2j are the expected values of fjk and f2jk with respect to the distribution qfk fitted during the factor update.)

As Lemma 2 in the paper shows (see Appendix A.2), this expression is optimized by setting s2j=Aj and xj=Bjs2j, and then solving the EBNM problem, where the EBNM model is: x=θ+e, θ1,,θniidg, ejN(0,s2j)

Solving the EBNM problem gives ˆg=argmax and \hat{q} = p(\theta \mid x, \hat{g})

Finally, to update the overall objective, we need to compute E_q \log \frac{g(\mathbf{l})}{q(\mathbf{l})}. FLASH uses a clever trick, noticing that E_{\hat{q}} \log \frac{\hat{g}(\mathbf{l})}{\hat{q}(\mathbf{l})} = F(\hat{g}, \hat{q}) + \frac{1}{2} \sum_j \left[ \log 2\pi s_j^2 + (1/s_j^2) E_{\hat{q}} (x_j - \theta_j)^2 \right] (See Appendix A.4.)

Direct method

When using ebnm_pn, however, it seems possible to compute E_q \log \frac{g(\mathbf{l})}{q(\mathbf{l})} directly. Since the elements l_1, \ldots, l_n are i.i.d. from g (by the FLASH model) and the posterior distributions are mutually independent (by the EBNM model), E_q \log \frac{g(\mathbf{l})}{q(\mathbf{l})} = \sum_i E_{q_i} \log \frac{g(l_i)}{q(l_i)}

I drop the subscripts i. Write g \sim \pi_0 \delta_0 + (1 - \pi_0) N(0, 1/a) and q \sim (1 - \tilde{w}) \delta_0 + \tilde{w} N(\tilde{\mu}, \tilde{\sigma}^2) (I use different parametrizations to make the derivation cleaner and to follow the code more closely.)

Then \begin{aligned} E_q \log \frac{g(l)}{q(l)} &= (1 - \tilde{w}) \log \frac{\pi_0}{1 - \tilde{w}} + \int \tilde{w}\ \text{dnorm}(x; \tilde{\mu}, \tilde{\sigma}^2) \log \frac{(1 - \pi_0)\text{dnorm}(x; 0, 1/a)} {\tilde{w}\ \text{dnorm}(x; \tilde{\mu}, \tilde{\sigma}^2)}\ dx \\ &= (1 - \tilde{w}) \log \frac{\pi_0}{1 - \tilde{w}} + \tilde{w} \log \frac{1 - \pi_0}{\tilde{w}} \\ &\ + \int \tilde{w}\ \text{dnorm}(x; \tilde{\mu}, \tilde{\sigma}^2) \log \left( \sqrt{a \tilde{\sigma}^2} \exp \left( -\frac{ax^2}{2} + \frac{(x - \tilde{\mu})^2}{2 \tilde{\sigma}^2} \right) \right) \ dx \end{aligned}

The last integral is equal to \begin{aligned} \frac{\tilde{w}}{2} \log (a \tilde{\sigma}^2) &- \frac{\tilde{w} a}{2} E_{N(x; \tilde{\mu}, \tilde{\sigma}^2)} x^2 + \frac{\tilde{w}}{2 \tilde{\sigma}^2} E_{N(x; \tilde{\mu}, \tilde{\sigma}^2)} (x - \tilde{\mu})^2 \\ &= \frac{\tilde{w}}{2} \log (a \tilde{\sigma}^2) - \frac{\tilde{w} a}{2} (\tilde{\mu}^2 + \tilde{\sigma}^2) + \frac{\tilde{w}}{2} \end{aligned}

Thus E_q \log \frac{g(l)}{q(l)} = (1 - \tilde{w}) \log \frac{\pi_0}{1 - \tilde{w}} + \tilde{w} \log \frac{1 - \pi_0}{\tilde{w}} + \frac{\tilde{w}}{2} \left( \log (a \tilde{\sigma}^2) - a(\tilde{\mu}^2 + \tilde{\sigma}^2) + 1 \right)


This reproducible R Markdown analysis was created with workflowr 1.0.1