Last updated: 2021-10-17

Let \(y\) be a binary variable with \(p(y=1) = p\), where \(p = g(\beta)\) and \(g(x) = \frac{1}{1+\exp(-x)}\). The function \(g\) is usually called logistic function or sigmoid function.

The log likelihood is

\[\begin{equation} \begin{split} l(\beta) &= \log p(y|\beta) \\&= y\log g(\beta)+(1-y)\log (1-g(\beta)) \\&= (-\beta-\log(1+\exp(-\beta))) +y\beta \\&= (y-1)\beta - \log(1+\exp(-\beta)) \end{split} \end{equation}\]

It’s sometimes more convenient to write

\[l(\beta) = g((2y-1)\beta).\]

The summation inside \(\log\) function is a pain and it is hard to deal with. For example in variational inference of logistic regression. One method to bypass the log sum is to lower bound the function \(-\log(1+\exp(-\beta))\).

In the paper Jaakkola and Jordan, 2000, the following lower bound is suggested.

Notice that \[-\log(1+\exp(-\beta)) = \frac{\beta}{2}-\log(\exp(\beta/2)+\exp(-\beta/2)),\]

where \(f(\beta) = -\log(\exp(\beta/2)+\exp(-\beta/2))\) is a convex function in \(\beta^2\). So we can bound \(f(\beta)\) globally with a first order Taylor expansion in \(\beta^2\), leading to

\[f(\beta)\geq -\frac{\eta}{2}-\log(1+\exp(-\eta))-\frac{1}{4\eta}tanh(\frac{\eta}{2})(\beta^2-\eta^2).\]

(I find the derivative \(\frac{\partial f(\beta)}{\partial\beta^2}\) should be \(-\frac{1}{4\eta}tanh(\frac{\eta}{2})\). Let’s have a check. It suggests that it should be \(-\frac{1}{4\eta}tanh(\frac{\eta}{2})\))

g = function(x){
f0 = function(x){
f0_lb = function(x,eta){
  -eta/2 + log(g(eta))-1/4/eta*tanh(eta/2)*(x^2-eta^2)
[1] -0.6931472
[1] -0.6977324
[1] -0.6977324

This lower bound is exact if \(\beta^2 = \eta^2\). The lower bound is also a quadratic function in \(\beta^2\) which is a very useful property for doing Gaussian approximation.

The unknown parameter \(\eta\) is suggested to be optimized within the algorithm. It turns out that \(\eta^2 = E(\beta^2)\), the second moment of \(\beta\).

Another two useful reference:

  1. Spike and slab variational Bayes for high dimensional logistic regression


Now a question is how about \(f(x,y) = -\log(e^\frac{x+y}{2}+e^{-\frac{x+y}{2}})\).

f = function(x,y){
x = y = seq(0,10,length.out = 30)
z = outer(x,y,f)
fig = plot_ly(x=x^2,y=y^2,z=z)
fig <- fig %>% add_surface()

The plot suggests it is a convex function in \((x^2,y^2)\). A first order Taylor series expansion of in the variable \((x^2,y^2)\) yields \[f(x,y)\geq f(a,b)-\frac{1}{4a}tanh(\frac{a+b}{2})(x^2-a^2) - \frac{1}{4b}tanh(\frac{a+b}{2})(y^2-b^2)\]

