Last updated: 2026-01-12

Checks: 7 0

Knit directory: fiveMinuteStats/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(12345)

The command set.seed(12345) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 68483e0

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 68483e0. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


working directory clean

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/summarize_interpret_posterior.Rmd) and HTML (docs/summarize_interpret_posterior.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	68483e0	Peter Carbonetto	2026-01-12	Minor updates to summarize_interpret_posterior vignette.
html	a221240	Peter Carbonetto	2026-01-09	Push a bunch of updates to the webpages.
Rmd	25e1cf5	Peter Carbonetto	2026-01-08	Adding pdf versions of three other vignettes.
Rmd	eababf2	MatthiasEckhart	2020-09-17	Fixed minor mistake (Bayesian CI explanation).
html	5f62ee6	Matthew Stephens	2019-03-31	Build site.
Rmd	0cd28bd	Matthew Stephens	2019-03-31	workflowr::wflow_publish(all = TRUE)
html	34bcc51	John Blischak	2017-03-06	Build site.
Rmd	5fbc8b5	John Blischak	2017-03-06	Update workflowr project with wflow_update (version 0.4.0).
html	8e61683	Marcus Davy	2017-03-03	rendered html using wflow_build(all=TRUE)
html	5d0fa13	Marcus Davy	2017-03-02	wflow_build() rendered html files
Rmd	d674141	Marcus Davy	2017-02-26	typos, refs
html	dc8a1bb	stephens999	2017-01-28	add html, figs
Rmd	c3e49d5	stephens999	2017-01-28	Files commited by wflow_commit.

See here for a PDF version of this vignette.

Overview

This vignette illustrates how to summarize and interpret a posterior distribution that has been computed analytically.

You should be familiar with simple analytic calculations of the posterior distribution of a parameter, such as for a binomial proportion.

Summarizing and interpreting a posterior

Suppose we have a parameter, \(q\), whose posterior distribution we have computed to be \(\mathrm{Beta}(31, 71)\) (as we obtained here for example). What does this mean? What statements can we make about \(q\)? How do we obtain interval estimates and point estimates for \(q\)?

Remember that the posterior distribution represents our uncertainty (or certainty) in \(q\), after combining the information in the data (the likelihood) with what we knew before collecting data (the prior).

To get some intuition, we could plot the posterior distribution so we can see what it looks like:

q <- seq(0,1,length.out = 100)
plot(q,dbeta(q,31,71),main = "posterior for q",ylab = "density",type = "l")

Based on this plot, we can visually see that this posterior distribution has the property that \(q\) is highly likely to be less than 0.4 because most of the mass of the distribution lies below 0.4. In Bayesian inference, we quantify statements like this — that a particular event is “highly likely” — by computing the “posterior probability” of the event, which is the probability of the event under the posterior distribution.

For example, in this case we can compute the (posterior) probability that \(q<0.4\), or \(\Pr(q < 0.4 \mid D)\). Since we know the posterior distribution is a \(\mathrm{Beta}(31,71)\) distribution, this probability is easy to compute using the pbeta function:

pbeta(0.4,31,71)
# [1] 0.9792202

So we would say, “The posterior probability that \(q\) is less than 0.4 is 0.98.”

Interval estimates

We can extend this idea to assess the certainty (or confidence) that \(q\) lies in any interval. For example, from the plot it looks like \(q\) will very likely lie in the interval [0.2, 0.4] because most of the posterior mass lies between these two numbers. To quantify how likely, we compute the (posterior) probability that \(q\) lies in the interval \([0.2, 0.4]\), \(\Pr(q \in [0.2, 0.4] \mid D)\). Again, this can be done using the “pbeta” function:

pbeta(0.4,31,71) - pbeta(0.2,31,71)
# [1] 0.9721229

Thus, based on our prior and the data, we would be highly confident (probability of 97%) that \(q\) lies between 0.2 and 0.4. That is, \([0.2,0.4]\) is a 97% Bayesian confidence interval for \(q\). (Bayesian confidence intervals are often called “credible intervals”, and also often abbreviated to CI.)

In practice, it is more common to compute Bayesian confidence intervals the other way around: specify the level of confidence we want to achieve, and find an interval that achieves that level of confidence. This can be done by computing the quantiles of the posterior distribution. For example, the 0.05 and 0.95 quantiles of the posterior would define a 90% Bayesian confidence interval. In our example, these quantiles of the Beta distribution can be computed using the “qbeta” function:

qbeta(0.05,31,71)
# [1] 0.2315858
qbeta(0.95,31,71)
# [1] 0.38065

So [0.23, 0.38] is a 90% Bayesian confidence interval for \(q\). (It is 90% because there is a 5% chance of it being below 0.23 and 5% of it being above 0.38).

Point estimates

In some, cases, we might be happy to give our “best guess” for \(q\), rather than worrying about the uncertainty. That is, we might be interested in giving a “point estimate” for \(q\). Essentially, this boils down to summarizing the posterior distribution by a single number.

When \(q\) is a continuously-valued variable, as it is here, the most common Bayesian point estimate is the mean (or expectation) of the posterior distribution, which is called the “posterior mean”. The mean of the \(\mathrm{Beta}(31, 71)\) distribution is 31/(31+71) = 0.3. So we would say, “The posterior mean for \(q\) is 0.3.”

An alternative to the mean is the median. The median of the \(\mathrm{Beta}(31, 71)\) distribution can be found using qbeta:

qbeta(0.5, 31,71)
# [1] 0.3026356

So we would say, “The posterior median for \(q\) is 0.3.”

The mode of the posterior (“posterior mode”) is another possible summary, although this perhaps makes more sense in settings where \(q\) is a discrete variable rather than a continuous variable.

Summary

The most common summaries of a posterior distribution are interval estimates and point estimates.
Interval estimates can be obtained by computing quantiles of the posterior distribution. Bayesian confidence intervals are often called “credible intervals”.
Point estimates are typically obtained by computing the mean or median (or mode) of the posterior distribution. These are called the “posterior mean”, “posterior median” and “posterior mode”.

Exercise

Suppose you are interested in a parameter \(\theta\), and you obtain a posterior distribution for \(\theta\) that is normal with mean 0.2 and standard deviation 0.4. Find:

A 90% credible interval for \(\theta\).
A 95% credible interval for \(\theta\).
A point estimate for \(\theta\).

sessionInfo()
# R version 4.3.3 (2024-02-29)
# Platform: aarch64-apple-darwin20 (64-bit)
# Running under: macOS 15.7.1
# 
# Matrix products: default
# BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
# LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
# 
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# 
# time zone: America/Chicago
# tzcode source: internal
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# loaded via a namespace (and not attached):
#  [1] vctrs_0.6.5       cli_3.6.5         knitr_1.50        rlang_1.1.6      
#  [5] xfun_0.52         stringi_1.8.7     promises_1.3.3    jsonlite_2.0.0   
#  [9] workflowr_1.7.1   glue_1.8.0        rprojroot_2.0.4   git2r_0.33.0     
# [13] htmltools_0.5.8.1 httpuv_1.6.14     sass_0.4.10       rmarkdown_2.29   
# [17] evaluate_1.0.4    jquerylib_0.1.4   tibble_3.3.0      fastmap_1.2.0    
# [21] yaml_2.3.10       lifecycle_1.0.4   whisker_0.4.1     stringr_1.5.1    
# [25] compiler_4.3.3    fs_1.6.6          Rcpp_1.1.0        pkgconfig_2.0.3  
# [29] later_1.4.2       digest_0.6.37     R6_2.6.1          pillar_1.11.0    
# [33] magrittr_2.0.3    bslib_0.9.0       tools_4.3.3       cachem_1.1.0

Summarizing and interpreting the posterior (analytic)

Matthew Stephens

January 12, 2026