Fully Bayes Normal Means

Background

Fully Bayes approach

Session information

Last updated: 2018-05-03

workflowr checks: (Click a bullet for more information)

✔ R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed: set.seed(20180411)

The command set.seed(20180411) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: 196d0e3
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
```
Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    .sos/
    Ignored:    exams/

Untracked files:
    Untracked:  analysis/pca_cell_cycle.Rmd
    Untracked:  analysis/ridge_mle.Rmd
    Untracked:  docs/figure/pca_cell_cycle.Rmd/

Unstaged changes:
    Modified:   analysis/cell_cycle.Rmd
    Modified:   analysis/density_est_cell_cycle.Rmd
    Modified:   analysis/eb_vs_soft.Rmd
    Modified:   analysis/eight_schools.Rmd
    Modified:   analysis/glmnet_intro.Rmd
```
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

Expand here to see past versions:

File	Version	Author	Date	Message
Rmd	196d0e3	stephens999	2018-05-03	wflow_publish(“analysis/bayes_normal_means.Rmd”)

Background

In a previous homework you implemented Empirical Bayes (EB) shrinkage for the normal means problem with a normal prior. That is we have data $X=(X_1,\dots,X_n)$ : $X_j | \theta_j, s_j \sim N(\theta_j, s_j^2)$ and assume $\theta_j | \mu,\sigma \sim N(\mu,\sigma^2) \quad j=1,\dots,n.$

The EB approach involved two steps:

Estimates $\mu, \sigma$ by maximizing the log-likelihood $l(\mu,\sigma) =\log p(X | \mu,\sigma)$ .
Compute the posteriore distribution $p(\theta_j | \hat\mu,\hat\sigma)$ .

The EB approach can be criticized for ignoring uncertainty in the estimates of $\mu$ and $\sigma$ . Here we will use MCMC to do a fully Bayesian analysis that takes account of this uncertainty.

Fully Bayes approach

To make this easier we will first re-parameterize to use $\eta = log(\sigma)$ , so $\eta$ can take any value on the real line.

We will use a uniform prior on $(\mu,\eta)$ , $p(\mu,\eta) \propto 1$ in the range $\mu = [-a,a]$ and $\eta \in [-b,b]$ . You can use $a=10^6$ and $b=10$ . (Because $\eta$ is on the log scale, $b=10$ covers a wide range of possible standard deviations). Thus the posterior distribution on $\mu,\eta$ is given by $p(\mu,\eta | X) \propto p(X | \mu, \eta) I(|\mu|<a) I(|\eta|<b)$

where $I$ denotes an indicator function.

Modify your log-likelihood computation code from your previous homework to compute the log-likelihood for $(\mu,\eta)$ given data $X$ (and standard deviations $s$ ).
Use this to implement a MH algorithm to sample from $\pi(\mu,\eta) \propto p(X | mu,\eta)$ . Note: In computing the MH acceptance probability you need to compute a ratio $L_1/L_2$ . For numerical stability reasons you should always compute this ratio by $\exp(l_1 - l_2)$ where $l_i = \log(L_i)$ rather than directly computing $L_1$ and $L_2$ and then computing their ratio. (If both $L_1$ and $L_2$ are very small, they may be 0 to machine precision, which causes problems if you try to compute $L_1/L_2$ directly.)
Apply your MH algorithm to simulated data where you know the answer. Run you MH algorithm multiple (at least 3) times from multiple different initializations. For each run plot how the value of $log \pi(\mu^t,\eta^t)$ changes with iteration $t$ . You should see that it starts from a low value (assuming you initialized to something that is not consistent with the data) and then gradually increases until it settles down to a “steady state” behavior. Use these plots to help decide how many iterations to run your algorithm to get reliable results (ie so results from different runs look similar) and how many iterations to discard as “burn-in”. Compare your posterior distributions of $\mu$ and $\eta$ with the true values you simulated (the distributions should cover the true values unless you did something wrong or are unlucky!)
Repeat part 3 for the “8 schools data” here (omitting the comparisons with the true values, which of course you do not know here).
Note that the posterior distribution on $\theta_j$ is given by: $p(\theta_j | X) = \int p(\theta_j | X, \mu, \eta)p(\mu,\eta | X)$ which is the expectation of $p(\theta_j | X, \mu, \eta)$ over the posteriore $p(\mu,\eta | X)$ . Computing posterior distributions like this is sometimes referred to as “integrating out uncertainty in” $\mu,\eta$ . (It is useful to compare this with the EB approach of just plugging in the maximum likelihood estimates and computing $p(\theta_j | X, \hat{\mu},\hat{\eta})$ . Notice that the two will produce similar results if the posterior distribution $p(\mu,\eta | X)$ is very concentrated around the mle.)

Given $T$ samples $\mu^1,\eta^1,\dots,\mu^T, \eta^T$ from the posterior distribution $p(\mu,\eta | X)$ you can approximate this expectation by $p(\theta_j | X) \approx (1/T)\sum_t p(\theta_j | X, \mu^t, \eta^t).$ So you can approximate the posterior mean by $E(\theta_j | X) \approx (1/T)\sum_t E(\theta_j | X, \mu^t, \eta^t).$

Using the same idea, given an expression to approximate the posterior second moment $E(\theta^2_j | X)$ , and so approximate the posterior variance (and hence posterior standard deviation).

Use the results from 4 and 5 to compute an approximate posterior mean and posterior standard deviation for $\theta_j$ for each school in the 8 schools data. Compare and contrast your results with the EB results and also the discussion in the initial blog-post here

Session information

sessionInfo()

R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.6

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] workflowr_1.0.1   Rcpp_0.12.16      digest_0.6.15    
 [4] rprojroot_1.3-2   R.methodsS3_1.7.1 backports_1.1.2  
 [7] git2r_0.21.0      magrittr_1.5      evaluate_0.10.1  
[10] stringi_1.1.7     whisker_0.3-2     R.oo_1.22.0      
[13] R.utils_2.6.0     rmarkdown_1.9     tools_3.3.2      
[16] stringr_1.3.0     yaml_2.1.18       htmltools_0.3.6  
[19] knitr_1.20

This reproducible R Markdown analysis was created with workflowr 1.0.1

Fully Bayes Normal Means

Matthew Stephens

May 3, 2018

Background

Fully Bayes approach

Session information