Processing math: 100%
  • Pre-requisites
  • Background
  • Generalized Log-Likelihood Ratios
  • Wilks’s Theorem
  • Example: Poisson Distribution
  • Session information

Last updated: 2017-03-06

Code version: c7339fc

Pre-requisites

This document assumes familiarity with the concepts of likelihoods, likelihood ratios, and hypothesis testing.

Background

When performing a statistical hypothesis test, like comparing two models, if the hypotheses completely specify the probability distributions, these hypotheses are called simple hypotheses. For example, suppose we observe X1,,Xn from a normal distribution with known variance and we want to test whether the true mean is equal to μ0 or μ1. One hypothesis H0 might be that the distribution has mean μ0, and H1 might be that the mean is μ1. Since these hypotheses completely specify the distribution of the Xi, they are called simple hypotheses.

Now suppose H0 is again that the true mean, μ, is equal to μ0, but H1 was that μ>μ0. In this case, the H0 is still simple, but H1 does not completely specify a single probability distribution. It specifies a set of distributions, and is therefore an example of a composite hypothesis. In most practical scenarios, both hypotheses are rarely simple.

As seen in the fiveMinuteStats on likelihood ratios, given the observed data X1,Xn, we can measure the relative plausibility of H1 to H0 by the log-likelihood ratio: log(f(X1,,Xn|H1)f(X1,,Xn|H0))

The log-likelihood ratio could help us choose which model (H0 or H1) is a more likely explanation for the data. One common question is this: what constitutes are large likelihood ratio? Wilks’s Theorem helps us answer this question – but first, we will define the notion of a generalized log-likelihood ratio.

Generalized Log-Likelihood Ratios

Let’s assume we are dealing with distributions parameterized by θ. To generalize the case of simple hypotheses, let’s assume that H0 specifies that θ lives in some set Θ0 and H1 specifies that θΘ1. Let Ω=Θ0Θ1. A somewhat natural extension to the likelihood ratio test statistic we discussed above is the generalized log-likelihood ratio: Λ=logmaxθΘ1f(X1,,Xn|θ)maxθΘ0f(X1,,Xn|θ)

For technical reasons, it is preferable to use the following related quantity:

Λn=2logmaxθΩf(X1,,Xn|θ)maxθΘ0f(X1,,Xn|θ)

Just like before, larger values of Λn provide stronger evidence against H0.

Wilks’s Theorem

Suppose that the dimension of Ω=v and the dimension of Θ0=r. Under regularity conditions and assuming H0 is true, the distribution of Λn tends to a chi-squared distribution with degrees of freedom equal to vr as the sample size tends to infinity.

With this theorem in hand (and for n large), we can compare the value of our log-likehood ratio to the expected values from a χ2vr distribution.

Example: Poisson Distribution

Assume we observe data X1,Xn and consider the hypotheses H0:λ=λ0 and H1:λλ0. The likelihood is: L(λ|X1,,Xn)=λXienλniXi!

Note that Θ1 in this case is the set of all λλ0. In the numerator of the expression for Λn, we seek maxθΩf(X1,,Xn|θ). This is just the maximum likelihood estimate of λ which we derived in this note. The MLE is simply the sample average ˉX. The likelihood ratio is therefore: L(λ=ˉX|X1,,Xn)L(λ=λ0|X1,,Xn)=ˉXXienˉXniXi!niXi!λXi0enλ0=(ˉXλ0)iXien(λ0ˉX)

which means that Λn is Λn=2log((ˉXλ0)iXien(λ0ˉX))=2n(ˉXlog(ˉXλ0)+λ0ˉX)

In this example we have that v, the dimension of Ω, is 1 (any positive real number) and r, the dimension of Θ0 is 0 (it’s just a single point). Hence, the degrees of freedom of the asymptotic χ2 distribution is vr=1. Therefore, Wilk’s Theorem tells us that Λn tends to a χ21 distribution as n tends to infinity.

Below we simulate computing Λn over 5000 experiments. In each experiment, we observe 500 random variables distributed as Poisson(0.4). We then plot the histogram of the Λns and overlay the χ21 density with a solid line.

num.iterations         <- 5000
lambda.truth           <- 0.4
num.samples.per.iter   <- 500
samples                <- numeric(num.iterations)
for(iter in seq_len(num.iterations)) {
  data            <- rpois(num.samples.per.iter, lambda.truth)
  samples[iter]   <- 2*num.samples.per.iter*(mean(data)*log(mean(data)/lambda.truth) + lambda.truth - mean(data))
}
hist(samples, freq=F, main='Histogram of LLR', xlab='sampled values of LLR values')
curve(dchisq(x, 1), 0, 20, lwd=2, xlab = "", ylab = "", add = T)

Session information

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] knitr_1.15.1       MASS_7.3-45        expm_0.999-0      
[4] Matrix_1.2-8       workflowr_0.4.0    rmarkdown_1.3.9004

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.9     lattice_0.20-34 gtools_3.5.0    digest_0.6.12  
 [5] rprojroot_1.2   mime_0.5        R6_2.2.0        grid_3.3.2     
 [9] xtable_1.8-2    backports_1.0.5 git2r_0.18.0    magrittr_1.5   
[13] evaluate_0.10   stringi_1.1.2   tools_3.3.2     stringr_1.2.0  
[17] shiny_1.0.0     httpuv_1.3.3    yaml_2.1.14     htmltools_0.3.5

This site was created with R Markdown