**Last updated:** 2017-03-04

**Code version:** 5d0fa13

The purpose of this vignette is to introduce the Beta distribution. You should be familiar with basic concepts related to distributions before - e.g. maybe you have come across the normal distribution and a uniform distribution before, and understand what it would mean to talk about their mean, variance and density.

If you want more details you could look at Wikipedia.

The Beta distribution is a distribution on the interval \([0,1]\). Probably you have come across the \(U[0,1]\) distribution before: the uniform distribution on \([0,1]\). You can think of the Beta distribution as a generalization of this that allows for some simple non-uniform distributions for values between 0 and 1.

The Beta distribution has two parameters, which we will call \(a\) and \(b\). These two parameters determine the shape of the Beta distributions (just as the mean and variance determine the shape of the normal distribution).

Following the usual convention, we will write \(X \sim Be(a,b)\) as shorthand for “\(X\) has a Beta distribution with parameters \(a\) and \(b\)”.

If \(X \sim Be(a,b)\) then the density of \(X\) is: \[f_X(x) = \frac{1}{B(a,b)} x^{a-1}(1-x)^{b-1} \qquad (x \in [0,1]).\]

For those of you that are interested, \(B(a,b)\) is known as the “beta function” and is given by the integral \[B(a,b) = \int_0^1 x^{a-1} (1-x)^{b-1} \,dx.\] This is where the beta distribution gets its name: its density involves the beta function. However, for this introduction you do not have to worry very much about what \(B(a,b)\) is: think of it as a constant (in that it does not depend on \(x\)), that is included so that the density integrates to 1, as all densities must.

Because the Beta distribution is widely used, R has the built in function `dbeta`

to compute this density. We will use this to look at some examples of the Beta distribution below.

First we will look at some examples for \(a=b\), with both \(\geq 1\):

```
p = seq(0,1, length=100)
plot(p, dbeta(p, 100, 100), ylab="density", type ="l", col=4)
lines(p, dbeta(p, 10, 10), type ="l", col=3)
lines(p, dbeta(p, 2, 2), col=2)
lines(p, dbeta(p, 1, 1), col=1)
legend(0.7,8, c("Be(100,100)","Be(10,10)","Be(2,2)", "Be(1,1)"),lty=c(1,1,1,1),col=c(4,3,2,1))
```

Now non-equal values of \(a\) and \(b\) with both \(\geq 1\):

```
p = seq(0,1, length=100)
plot(p, dbeta(p, 900, 100), ylab="density", type ="l", col=4)
lines(p, dbeta(p, 90, 10), type ="l", col=3)
lines(p, dbeta(p, 30, 70), col=2)
lines(p, dbeta(p, 3, 7), col=1)
legend(0.2,30, c("Be(900,100)","Be(90,10)","Be(30,70)", "Be(3,7)"),lty=c(1,1,1,1),col=c(4,3,2,1))
```

From these examples you should note the following:

- The distribution is roughly centered on \(a/(a+b)\). Actually, it turns out that the mean is exactly \(a/(a+b)\). Thus the mean of the distribution is determined by the
*relative*values of \(a\) and \(b\). - The larger the values of \(a\) and \(b\), the smaller the variance of the distribution about the mean.
- For moderately large values of \(a\) and \(b\) the distribution looks visually “kind of normal”, although unlike the normal distribution the Beta distribution is restricted to [0,1].

- The special case \(a=b=1\) is the uniform distribution.

The parameters \(a\) and \(b\) can also be less than 1, but the distribution in this case starts to have a different kind of shape. Specifically if \(a<1\) then there is a peak at 0, and if \(b<1\) then there is a peak at 1 (so if both are \(<1\) then the distribution is U-shaped). Here are some examples:

```
p = seq(0,1, length=100)
plot(p, dbeta(p, 0.1, 0.1), ylab="density", type ="l", col=4)
lines(p, dbeta(p, 0.5, 0.5), type ="l", col=3)
lines(p, dbeta(p, 0.1, 0.5), col=2)
lines(p, dbeta(p, 0.5, 2), col=1)
legend(0.5,2, c("Be(0.1,0.1)","Be(0.5,0.5)","Be(0.1,0.5)", "Be(0.5,2)"),lty=c(1,1,1,1),col=c(4,3,2,1))
```

- Sketch what you think the Be(5,5) and Be(0.5,5) and Be(500,200) distributions would look like. Check your sketches against the truth computed using
`dbeta`

.

`sessionInfo()`

```
R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)
locale:
[1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tidyr_0.4.1 dplyr_0.5.0 ggplot2_2.1.0 knitr_1.15.1
[5] MASS_7.3-45 expm_0.999-0 Matrix_1.2-6 viridis_0.3.4
[9] workflowr_0.3.0 rmarkdown_1.3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.5 git2r_0.18.0 plyr_1.8.4 tools_3.3.0
[5] digest_0.6.9 evaluate_0.10 tibble_1.1 gtable_0.2.0
[9] lattice_0.20-33 shiny_0.13.2 DBI_0.4-1 yaml_2.1.14
[13] gridExtra_2.2.1 stringr_1.2.0 gtools_3.5.0 rprojroot_1.2
[17] grid_3.3.0 R6_2.1.2 reshape2_1.4.1 magrittr_1.5
[21] backports_1.0.5 scales_0.4.0 htmltools_0.3.5 assertthat_0.1
[25] mime_0.5 colorspace_1.2-6 xtable_1.8-2 httpuv_1.3.3
[29] labeling_0.3 stringi_1.1.2 munsell_0.4.3
```

This site was created with R Markdown