Necessary things to do:
All econ problems are optimization problems
All econ problems are optimization problems
All econ problems are optimization problems
Min costs
Max PV E[welfare]
All econ problems are optimization problems
Min costs
Max PV E[welfare]
Some are harder than others:
All econ problems are optimization problems
Min costs
Max PV E[welfare]
Some are harder than others:
All econ problems are optimization problems
Min costs
Max PV E[welfare]
Some are harder than others:
Individual utility max: easy
Decentralized electricity market with nodal pricing and market power: hard
All econ problems are optimization problems
Min costs
Max PV E[welfare]
Some are harder than others:
Individual utility max: easy
Decentralized electricity market with nodal pricing and market power: hard
One input profit maximization problem: easy
All econ problems are optimization problems
Min costs
Max PV E[welfare]
Some are harder than others:
Individual utility max: easy
Decentralized electricity market with nodal pricing and market power: hard
One input profit maximization problem: easy
N-input profit maximization with learning and forecasts: hard
How do we solve these?
How do we solve these?
Consider a simple generic problem:
Ax=b
How do we solve these?
Consider a simple generic problem:
Ax=b
Invert A
x=A−1b
How do we solve these?
Consider a simple generic problem:
Ax=b
Invert A
x=A−1b
With non-linear rootfinding problems we want to solve:
f(x)=0,f:R→Rn
With non-linear rootfinding problems we want to solve:
f(x)=0,f:R→Rn
What's a common rootfinding problem?
With non-linear rootfinding problems we want to solve:
f(x)=0,f:R→Rn
What's a common rootfinding problem?
Can we reframe a common economic problem as rootfinding?
With non-linear rootfinding problems we want to solve:
f(x)=0,f:R→Rn
What's a common rootfinding problem?
Can we reframe a common economic problem as rootfinding?
Yes!
With non-linear rootfinding problems we want to solve:
f(x)=0,f:R→Rn
What's a common rootfinding problem?
Can we reframe a common economic problem as rootfinding?
Yes!
Fixed point problems are rootfinding problems:
With non-linear rootfinding problems we want to solve:
f(x)=0,f:R→Rn
What's a common rootfinding problem?
Can we reframe a common economic problem as rootfinding?
Yes!
Fixed point problems are rootfinding problems:
g(x)=x⇒f(x)≡g(x)−x=0
Economic problems virtually always involve constraints
Economic problems virtually always involve constraints
These constraints may or may not be binding
Economic problems virtually always involve constraints
These constraints may or may not be binding
e.g. a firm profit max problem: a control vector x may be constrained to be x∈[a,b]
Economic problems virtually always involve constraints
These constraints may or may not be binding
e.g. a firm profit max problem: a control vector x may be constrained to be x∈[a,b]
This results in a complementarity problem:
Economic problems virtually always involve constraints
These constraints may or may not be binding
e.g. a firm profit max problem: a control vector x may be constrained to be x∈[a,b]
This results in a complementarity problem:
xi>ai⇒fi(x)≥0∀i=1,...,nxi<bi⇒fi(x)≤0∀i=1,...,n
Economic problems virtually always involve constraints
These constraints may or may not be binding
e.g. a firm profit max problem: a control vector x may be constrained to be x∈[a,b]
This results in a complementarity problem:
xi>ai⇒fi(x)≥0∀i=1,...,nxi<bi⇒fi(x)≤0∀i=1,...,n
What do these equations mean?
xi>ai⇒fi(x)≥0∀i=1,...,nxi<bi⇒fi(x)≤0∀i=1,...,n
If the constraints on xi do not bind, ai<xi<bi, then the first-order condition is precisely zero
xi>ai⇒fi(x)≥0∀i=1,...,nxi<bi⇒fi(x)≤0∀i=1,...,n
If the constraints on xi do not bind, ai<xi<bi, then the first-order condition is precisely zero
Suppose the upper bound binds, xi=bi
xi>ai⇒fi(x)≥0∀i=1,...,nxi<bi⇒fi(x)≤0∀i=1,...,n
If the constraints on xi do not bind, ai<xi<bi, then the first-order condition is precisely zero
Suppose the upper bound binds, xi=bi
Then fi(x)≥0 since xi>ai, but we can't guarantee that fi(x)=0 because fi(x) might still be increasing
What does the intermediate value theorem tell us?
What does the intermediate value theorem tell us?
If a continuous real-valued function on a given interval takes on two values a and b,
it achieves all values in the set [a,b] somewhere in its domain
What does the intermediate value theorem tell us?
If a continuous real-valued function on a given interval takes on two values a and b,
it achieves all values in the set [a,b] somewhere in its domain
How can this motivate an algorithm to find the root of a function?
What does the intermediate value theorem tell us?
If a continuous real-valued function on a given interval takes on two values a and b,
it achieves all values in the set [a,b] somewhere in its domain
How can this motivate an algorithm to find the root of a function?
If we have a continuous, 1 variable function that is positive at some value and negative at another, a root must fall in between those values
What does the intermediate value theorem tell us?
If a continuous real-valued function on a given interval takes on two values a and b,
it achieves all values in the set [a,b] somewhere in its domain
How can this motivate an algorithm to find the root of a function?
If we have a continuous, 1 variable function that is positive at some value and negative at another, a root must fall in between those values
We know a root exists by IVT, what's an efficient way to find it?
What does the intermediate value theorem tell us?
If a continuous real-valued function on a given interval takes on two values a and b,
it achieves all values in the set [a,b] somewhere in its domain
How can this motivate an algorithm to find the root of a function?
If we have a continuous, 1 variable function that is positive at some value and negative at another, a root must fall in between those values
We know a root exists by IVT, what's an efficient way to find it?
Continually bisect the interval!
What does the intermediate value theorem tell us?
If a continuous real-valued function on a given interval takes on two values a and b,
it achieves all values in the set [a,b] somewhere in its domain
How can this motivate an algorithm to find the root of a function?
If we have a continuous, 1 variable function that is positive at some value and negative at another, a root must fall in between those values
We know a root exists by IVT, what's an efficient way to find it?
Continually bisect the interval!
Write out the code to do it
function bisection(f, lower_bound, upper_bound) tolerance = 1e-3 # tolerance for solution guess = 0.5*(upper_bound + lower_bound) # initial guess, bisect the interval difference = (upper_bound - lower_bound)/2 # initialize bound difference while difference > tolerance # loop until convergence println("Intermediate guess of $guess.") difference = difference/2 if sign(f(lower_bound)) == sign(f(guess)) # if the guess has the same sign as the lower bound lower_bound = guess # solution is in the upper half of the interval guess = guess + difference else # else the solution is in the lower half of the interval upper_bound = guess guess = guess - difference end end println("The root of f(x) is at $guess.")end;
f(x) = x^3
## f (generic function with 1 method)
bisection(f, -4, 1)
## Intermediate guess of -1.5.## Intermediate guess of -0.25.## Intermediate guess of 0.375.## Intermediate guess of 0.0625.## Intermediate guess of -0.09375.## Intermediate guess of -0.015625.## Intermediate guess of 0.0234375.## Intermediate guess of 0.00390625.## Intermediate guess of -0.005859375.## Intermediate guess of -0.0009765625.## Intermediate guess of 0.00146484375.## Intermediate guess of 0.000244140625.## The root of f(x) is at -0.0003662109375.
g(x) = 3x^3 + 2x -4
## g (generic function with 1 method)
bisection(g, -6, 4)
## Intermediate guess of -1.0.## Intermediate guess of 1.5.## Intermediate guess of 0.25.## Intermediate guess of 0.875.## Intermediate guess of 1.1875.## Intermediate guess of 1.03125.## Intermediate guess of 0.953125.## Intermediate guess of 0.9140625.## Intermediate guess of 0.89453125.## Intermediate guess of 0.904296875.## Intermediate guess of 0.8994140625.## Intermediate guess of 0.90185546875.## Intermediate guess of 0.900634765625.## The root of f(x) is at 0.9012451171875.
h(x) = cos(x)
## h (generic function with 1 method)
bisection(h, -pi, pi)
## Intermediate guess of 0.0.## Intermediate guess of -1.5707963267948966.## Intermediate guess of -2.356194490192345.## Intermediate guess of -1.9634954084936207.## Intermediate guess of -1.7671458676442586.## Intermediate guess of -1.6689710972195777.## Intermediate guess of -1.6198837120072371.## Intermediate guess of -1.595340019401067.## Intermediate guess of -1.5830681730979819.## Intermediate guess of -1.5769322499464393.## Intermediate guess of -1.573864288370668.## Intermediate guess of -1.5723303075827824.## The root of f(x) is at -1.5715633171888395.
The bisection method works by continually bisecting the interval and only keeping the half interval with the zero until "convergence"
The bisection method is incredibly robust: if a function f satisfies the IVT, it is guaranteed to converge in a specific number of iterations
The bisection method is incredibly robust: if a function f satisfies the IVT, it is guaranteed to converge in a specific number of iterations
A root can be calculated to arbitrary precision ϵ
in a maximum of log([b−a]/ϵ)/log(2) iterations
Robustness comes with drawbacks:
Fixed points can be computed using function iteration
Fixed points can be computed using function iteration
Since we can recast fixed points as rootfinding problems we can use function iteration to find roots too
Fixed points can be computed using function iteration
Since we can recast fixed points as rootfinding problems we can use function iteration to find roots too
Code up a function iteration problem
Function iteration is pretty simple to implement
function function_iteration(f, guess) tolerance = 1e-3 # tolerance for solution x_old = guess # initialize old x value x = 1e-10 # initialize current x difference = 1e10 # initialize bound difference @time while abs(difference) > tolerance # loop until convergence println("Intermediate guess of $x.") x = f(x_old) # new x = f(old x) difference = x - x_old x_old = x end println("The fixed point of f(x) is at $x.")end;
f(x) = x^(-0.5)
## f (generic function with 1 method)
function_iteration(f, 2.)
## Intermediate guess of 1.0e-10.## Intermediate guess of 0.7071067811865476.## Intermediate guess of 1.189207115002721.## Intermediate guess of 0.9170040432046712.## Intermediate guess of 1.0442737824274138.## Intermediate guess of 0.9785720620877002.## Intermediate guess of 1.0108892860517005.## Intermediate guess of 0.9945994234836332.## Intermediate guess of 1.0027112750502025.## Intermediate guess of 0.9986471128909702.## Intermediate guess of 1.0006771306930664.## Intermediate guess of 0.9996616064962437.## 0.002017 seconds (924 allocations: 233.763 KiB)## The fixed point of f(x) is at 1.0001692397053021.
Function iteration can be quick, but is not always guaranteed to converge
Newton's method and variants are the workhorses of solving n-dimensional non-linear problems
Newton's method and variants are the workhorses of solving n-dimensional non-linear problems
What's the idea?
Newton's method and variants are the workhorses of solving n-dimensional non-linear problems
What's the idea?
Take a hard non-linear problem and replace it with a sequence of linear problems
Newton's method and variants are the workhorses of solving n-dimensional non-linear problems
What's the idea?
Take a hard non-linear problem and replace it with a sequence of linear problems
Under certain conditions the sequence of solutions will converge to the true solution
Here's a graphical depiction of Newton's method:
Start with an initial guess of the root at x(0)
Start with an initial guess of the root at x(0)
Approximate the non-linear function with its first-order Taylor expansion about x(0)
Start with an initial guess of the root at x(0)
Approximate the non-linear function with its first-order Taylor expansion about x(0)
This is just the tangent line at x0, solve for the root of this linear approximation, call it x(1)
Start with an initial guess of the root at x(0)
Approximate the non-linear function with its first-order Taylor expansion about x(0)
This is just the tangent line at x0, solve for the root of this linear approximation, call it x(1)
Repeat starting at x(1)
Start with an initial guess of the root at x(0)
Approximate the non-linear function with its first-order Taylor expansion about x(0)
This is just the tangent line at x0, solve for the root of this linear approximation, call it x(1)
Repeat starting at x(1)
This can be applied to a function with an arbitrary number of dimensions
Begin with some initial guess of the root vector x(0)
Begin with some initial guess of the root vector x(0)
Our new guess x(k+1) given some arbitrary point in the algorithm, x(k), is obtained by approximating f(x) using a first-order Taylor expansion about x(k) and solving for x: f(x)≈f(x(k))+f′(x(k))(x−x(k))=0⇒x(k+1)=x(k)−[f′(x(k))]−1f(x(k))
Code up Newton's method
Code up Newton's method
function newtons_method(f, f_prime, guess) diff = Inf # Initialize problem tol = 1e-5 x_old = guess x = 1e10 while abs(diff) > tol x = f(x_old) - f(x_old)/f_prime(x_old) # Root of linear approximation diff = x - x_old x_old = x end println("The root of f(x) is at $x.")end;
f(x) = x^3;f_prime(x) = 3x^2;newtons_method(f, f_prime, 1.)
## The root of f(x) is at 1.231347218094855e-6.
f(x) = x^3;f_prime(x) = 3x^2;newtons_method(f, f_prime, 1.)
## The root of f(x) is at 1.231347218094855e-6.
f(x) = sin(x);f_prime(x) = cos(x);newtons_method(f, f_prime, pi/4)
## The root of f(x) is at 5.941936124988917e-19.
Newton's method has nice properties regarding convergence and speed:
If f(x) is continuously differentiable, the initial guess is "sufficiently close" to the root, and f(x) is invertible near the root, then Newton's method converges to the root
Newton's method has nice properties regarding convergence and speed:
If f(x) is continuously differentiable, the initial guess is "sufficiently close" to the root, and f(x) is invertible near the root, then Newton's method converges to the root
What is "sufficiently close"?
Newton's method has nice properties regarding convergence and speed:
If f(x) is continuously differentiable, the initial guess is "sufficiently close" to the root, and f(x) is invertible near the root, then Newton's method converges to the root
What is "sufficiently close"?
We need f(x) to be invertible so the algorithm above is well defined
Newton's method has nice properties regarding convergence and speed:
If f(x) is continuously differentiable, the initial guess is "sufficiently close" to the root, and f(x) is invertible near the root, then Newton's method converges to the root
What is "sufficiently close"?
We need f(x) to be invertible so the algorithm above is well defined
If f′(x) is ill-conditioned we can run into problems with rounding error
We usually don't want to deal with analytic derivatives unless we have access to autodifferentiation
We usually don't want to deal with analytic derivatives unless we have access to autodifferentiation
Why?
We usually don't want to deal with analytic derivatives unless we have access to autodifferentiation
Why?
We usually don't want to deal with analytic derivatives unless we have access to autodifferentiation
Why?
We usually don't want to deal with analytic derivatives unless we have access to autodifferentiation
Why?
Using our current root guess x(k) and our previous root guess x(k−1):
f′(x(k))≈f(x(k))−f(x(k−1))x(k)−x(k−1)
Our new iteration rule then becomes
Our new iteration rule then becomes
x(k+1)=x(k)−x(k)−x(k−1)f(x(k))−f(x(k−1))f(x(k))
Our new iteration rule then becomes
x(k+1)=x(k)−x(k)−x(k−1)f(x(k))−f(x(k−1))f(x(k))
Now we require two initial guesses so that we have an initial approximation of the derivative
Broyden's method is the most widely used rootfinding method for n-dimensional problems
Broyden's method is the most widely used rootfinding method for n-dimensional problems
It is a generalization of the secant method where have a sequence of guesses of the Jacobian at the root
Broyden's method is the most widely used rootfinding method for n-dimensional problems
It is a generalization of the secant method where have a sequence of guesses of the Jacobian at the root
We must initially provide a guess of the root, x(0), but also a guess of the Jacobian, A(0)
Root guess update is the same as before but with our guess of the Jacobian substituted in for the actual Jacobian or the finite difference approximation
x(k+1)=x(k)−A−1(k)f(x(k)).
Root guess update is the same as before but with our guess of the Jacobian substituted in for the actual Jacobian or the finite difference approximation
x(k+1)=x(k)−A−1(k)f(x(k)).
we still need to update A(k): we do this update is performed by making the smallest change, in terms of the Frobenius matrix norm, that satisfies what is called the secant condition (under determined if n>1): f(x(k+1))−f(x(k))=A(k+1)(x(k+1)−x(k))
Root guess update is the same as before but with our guess of the Jacobian substituted in for the actual Jacobian or the finite difference approximation
x(k+1)=x(k)−A−1(k)f(x(k)).
we still need to update A(k): we do this update is performed by making the smallest change, in terms of the Frobenius matrix norm, that satisfies what is called the secant condition (under determined if n>1): f(x(k+1))−f(x(k))=A(k+1)(x(k+1)−x(k))
The updated differences in root guesses, and the function value at those root guesses, should align with our estimate of the Jacobian at that point
Root guess update is the same as before but with our guess of the Jacobian substituted in for the actual Jacobian or the finite difference approximation
x(k+1)=x(k)−A−1(k)f(x(k)).
we still need to update A(k): we do this update is performed by making the smallest change, in terms of the Frobenius matrix norm, that satisfies what is called the secant condition (under determined if n>1): f(x(k+1))−f(x(k))=A(k+1)(x(k+1)−x(k))
The updated differences in root guesses, and the function value at those root guesses, should align with our estimate of the Jacobian at that point
A(k+1)=A(k)+[f(x(k+1))−f(x(k))−A(k+1)(x(k+1)−x(k))]x(k+1)−x(k)(x(k+1)−x(k))T(x(k+1)−x(k))
Why update the Jacobian and then invert when we can just update an inverted Jacobian B=A−1
B(k+1)=B(k)+[d(k)−u(k)]d(k)TB(k)d(k)Tu(k)
where d(k)=(x(k+1)−x(k)), and u(k)=B(k)[f(x(k+1))−f(x(k))].
Why update the Jacobian and then invert when we can just update an inverted Jacobian B=A−1
B(k+1)=B(k)+[d(k)−u(k)]d(k)TB(k)d(k)Tu(k)
where d(k)=(x(k+1)−x(k)), and u(k)=B(k)[f(x(k+1))−f(x(k))].
Broyden converges under relatively weak conditions:
Why update the Jacobian and then invert when we can just update an inverted Jacobian B=A−1
B(k+1)=B(k)+[d(k)−u(k)]d(k)TB(k)d(k)Tu(k)
where d(k)=(x(k+1)−x(k)), and u(k)=B(k)[f(x(k+1))−f(x(k))].
Broyden converges under relatively weak conditions:
Rootfinding algorithms will converge at different speeds in terms of the number of operations
Rootfinding algorithms will converge at different speeds in terms of the number of operations
A sequence of iterates x(k) is said to converge to x∗ at a rate of order p if there is a constant C such that
||x(k+1)−x∗||≤C||x(k)−x∗||p
for sufficiently large k
Rootfinding algorithms will converge at different speeds in terms of the number of operations
A sequence of iterates x(k) is said to converge to x∗ at a rate of order p if there is a constant C such that
||x(k+1)−x∗||≤C||x(k)−x∗||p
for sufficiently large k
If C<1 and p=1, the rate of convergence is linear
If 1<p<2, convergence is superlinear, and if p=2 convergence is quadratic. The higher order the convergence rate, the faster it converges
How fast do the methods we've seen converge?
How fast do the methods we've seen converge?
How fast do the methods we've seen converge?
Bisection: linear rate with C=0.5 (kind of obvious once you see it)
Function iteration: linear rate with C=||f′(x∗)||
How fast do the methods we've seen converge?
Bisection: linear rate with C=0.5 (kind of obvious once you see it)
Function iteration: linear rate with C=||f′(x∗)||
Secant and Broyden: superlinear rate with p≈1.62
How fast do the methods we've seen converge?
Bisection: linear rate with C=0.5 (kind of obvious once you see it)
Function iteration: linear rate with C=||f′(x∗)||
Secant and Broyden: superlinear rate with p≈1.62
Newton: p=2
Convergence rates only account for the number of iterations of the method
The steps taken in a given iteration of each solution method may vary in computational cost because of differences in the number of arithmetic operations
Although an algorithm may take more iterations to solve,
each iteration may be solved faster and the overall algorithm takes less time
Ex:
Ex:
Bisection and function iteration are usually slow
Broyden's method can be faster than Newton's method if derivatives are costly to compute
Consider an example where f(x)=x−√(x)=0
What does convergence look like across our main approaches in terms of the L1−norm if all guesses start at x(0)=0.5?
Let:
Let:
There are disequilibrium profit opportunities if
We obtain a no-arbitrage equilibrium if and only if x solves the complementary problem CP(f,a,b)
We obtain a no-arbitrage equilibrium if and only if x solves the complementary problem CP(f,a,b)
We can write out the problem as finding a vector x∈[a,b] that solves xi>ai⇒fi(x)≥0∀i=1,...,nxi<bi⇒fi(x)≤0∀i=1,...,n
We obtain a no-arbitrage equilibrium if and only if x solves the complementary problem CP(f,a,b)
We can write out the problem as finding a vector x∈[a,b] that solves xi>ai⇒fi(x)≥0∀i=1,...,nxi<bi⇒fi(x)≤0∀i=1,...,n
At interior solution, the function be precisely be zero
Corner solution at the upper bound bi for xi → f must be increasing in direction i
The opposite is true if we are at the lower bound
Economic problems are complementarity problems where we are
finding a root of a function (e.g. marginal profit) subject to some constraint (e.g. price floors)
The Karush-Kuhn-Tucker theorem shows that x solves the constrained optimization problem if and only if it solves the complementarity problem
Example: single commodity competitive spatial price equilibrium model
Example: single commodity competitive spatial price equilibrium model
If no trade → equilibrium condition is Ei(pi)=0 in all regions of the world: a simple rootfinding problem
Example: single commodity competitive spatial price equilibrium model
If no trade → equilibrium condition is Ei(pi)=0 in all regions of the world: a simple rootfinding problem
Trade between regions has marginal transportation cost between regions i and j of cij
Marginal arbitrage profit from shipping a unit of the good from i to j is pj−pi−cij
Marginal arbitrage profit from shipping a unit of the good from i to j is pj−pi−cij
If it's positive: incentive to ship more goods to region i from region j
If it's negative: incentive to decrease shipments
Marginal arbitrage profit from shipping a unit of the good from i to j is pj−pi−cij
If it's positive: incentive to ship more goods to region i from region j
If it's negative: incentive to decrease shipments
At an equilibrium only if all the arbitrage opportunities are gone: for all region pairs i and j 0≤xij≤bijxij>0⇒pj−pi−cij≥0xij<bij⇒pj−pi−cij≤0
At an equilibrium only if all the arbitrage opportunities are gone: for all region pairs i and j 0≤xij≤bijxij>0⇒pj−pi−cij≥0xij<bij⇒pj−pi−cij≤0
How do we formulate this as a complementarity problem?
How do we formulate this as a complementarity problem?
Market clearing in each region i requires that net imports = excess demand
How do we formulate this as a complementarity problem?
Market clearing in each region i requires that net imports = excess demand
∑k[xki−xik]=Ei(pi)
How do we formulate this as a complementarity problem?
Market clearing in each region i requires that net imports = excess demand
∑k[xki−xik]=Ei(pi)
This implies that we can solve for the price in region i,
pi=E−1i(∑k[xki−xik])
Finally, if we define marginal arbitrage profit from shipping another unit from i to j as
fij(x)=E−1j(∑k[xkj−xjk])−E−1i(∑k[xki−xik])−cij
then x is an equilibrium vector of trade flows if and only if x solves CP(f,0,b) and x, f, and b are n2×1 vectors
Even this complex trade-equilibrium model can be reduced to a simple complementarity problem
Decentralized models can generally be solved by reducing them to complementarity problems, e.g:
Four plots of marginal arbitrage profit fi(x), what is the equilibrium choice of x?
A complementarity problem CP(f,a,b) can be re-framed as a rootfinding problem easily ˆf(x)=min(max(f(x),a−x),b−x)=0
We can then solve the problem with conventional rootfinding techniques
or approximate this with a smooth function to make it easier on a rootfinder
We can then solve the problem with conventional rootfinding techniques
or approximate this with a smooth function to make it easier on a rootfinder
The workhorse solver for complementarity problems is the PATH solver: Julia wrapper can be found here
How we solve maximization problems has many similarities to rootfinding and complementarity problems
How we solve maximization problems has many similarities to rootfinding and complementarity problems
FOCs of an unconstrained maximization problem are just a rootfinding problem
How we solve maximization problems has many similarities to rootfinding and complementarity problems
FOCs of an unconstrained maximization problem are just a rootfinding problem
FOCs of a constrained maximization problem are just a complementarity problem
How we solve maximization problems has many similarities to rootfinding and complementarity problems
FOCs of an unconstrained maximization problem are just a rootfinding problem
FOCs of a constrained maximization problem are just a complementarity problem
I'll tend to frame problems as minimization problems because it is the convention the optimization literature
How we solve maximization problems has many similarities to rootfinding and complementarity problems
FOCs of an unconstrained maximization problem are just a rootfinding problem
FOCs of a constrained maximization problem are just a complementarity problem
I'll tend to frame problems as minimization problems because it is the convention the optimization literature
We make two distinctions:
How we solve maximization problems has many similarities to rootfinding and complementarity problems
FOCs of an unconstrained maximization problem are just a rootfinding problem
FOCs of a constrained maximization problem are just a complementarity problem
I'll tend to frame problems as minimization problems because it is the convention the optimization literature
We make two distinctions:
Local vs global: are we finding an optimum in a local region, or globally?
How we solve maximization problems has many similarities to rootfinding and complementarity problems
FOCs of an unconstrained maximization problem are just a rootfinding problem
FOCs of a constrained maximization problem are just a complementarity problem
I'll tend to frame problems as minimization problems because it is the convention the optimization literature
We make two distinctions:
Local vs global: are we finding an optimum in a local region, or globally?
Derivative-using vs derivative-free: Do we want to use higher-order information?
I'll focus on local solvers, common global solvers I won't cover:
Similar to bisection, golden search looks for a solution of a one-dimensional problem over smaller and smaller brackets
Similar to bisection, golden search looks for a solution of a one-dimensional problem over smaller and smaller brackets
If we have a continuous one dimensional function, f(x),
and we want to find a local minimum in some interval [a,b]
Similar to bisection, golden search looks for a solution of a one-dimensional problem over smaller and smaller brackets
If we have a continuous one dimensional function, f(x),
and we want to find a local minimum in some interval [a,b]
Similar to bisection, golden search looks for a solution of a one-dimensional problem over smaller and smaller brackets
If we have a continuous one dimensional function, f(x),
and we want to find a local minimum in some interval [a,b]
Select points x1,x2∈[a,b] where x2>x1
If f(x1)<f(x2) replace [a,b] with [a,x2], else replace [a,b] with [x1,b]
Similar to bisection, golden search looks for a solution of a one-dimensional problem over smaller and smaller brackets
If we have a continuous one dimensional function, f(x),
and we want to find a local minimum in some interval [a,b]
Select points x1,x2∈[a,b] where x2>x1
If f(x1)<f(x2) replace [a,b] with [a,x2], else replace [a,b] with [x1,b]
Repeat until convergence criterion is met
Similar to bisection, golden search looks for a solution of a one-dimensional problem over smaller and smaller brackets
If we have a continuous one dimensional function, f(x),
and we want to find a local minimum in some interval [a,b]
Select points x1,x2∈[a,b] where x2>x1
If f(x1)<f(x2) replace [a,b] with [a,x2], else replace [a,b] with [x1,b]
Repeat until convergence criterion is met
Replace the endpoint of the interval next to the evaluated point with the highest value → keep the lower evaluated point in the interval → guarantees that a local minimum still exists
How do we pick x1 and x2?
How do we pick x1 and x2?
Achievable goal for selection process:
How do we pick x1 and x2?
Achievable goal for selection process:
There's one algorithm that satisfies this
Golden search algorithm for point selection:
xi=a+αi(b−a)α1=3−√52α2=√5−12
Golden search algorithm for point selection:
xi=a+αi(b−a)α1=3−√52α2=√5−12
The value of α2 is called the golden ratio and is where the algorithm gets its name
Golden search algorithm for point selection:
xi=a+αi(b−a)α1=3−√52α2=√5−12
The value of α2 is called the golden ratio and is where the algorithm gets its name
Write out a golden search algorithm
function golden_search(f, lower_bound, upper_bound) alpha_1 = (3 - sqrt(5))/2 # GS parameter 1 alpha_2 = (sqrt(5) - 1)/2 # GS parameter 2 tolerance = 1e-2 # tolerance for convergence difference = 1e10 while difference > tolerance x_1 = lower_bound + alpha_1*(upper_bound - lower_bound) # new x_1 x_2 = lower_bound + alpha_2*(upper_bound - lower_bound) # new x_2 if f(x_1) < f(x_2) # reset bounds upper_bound = x_2 else lower_bound = x_1 end difference = x_2 - x_1 end println("Minimum is at x = $((lower_bound+upper_bound)/2).")end;
function golden_search(f, lower_bound, upper_bound) alpha_1 = (3 - sqrt(5))/2 # GS parameter 1 alpha_2 = (sqrt(5) - 1)/2 # GS parameter 2 tolerance = 1e-2 # tolerance for convergence difference = 1e10 while difference > tolerance x_1 = lower_bound + alpha_1*(upper_bound - lower_bound) # new x_1 x_2 = lower_bound + alpha_2*(upper_bound - lower_bound) # new x_2 if f(x_1) < f(x_2) # reset bounds upper_bound = x_2 else lower_bound = x_1 end difference = x_2 - x_1 end println("Minimum is at x = $((lower_bound+upper_bound)/2).")end;
f(x) = 2x^2 + 9x;golden_search(f, -10, 10)
## Minimum is at x = -2.2483173872886444.
Golden search is nice and simple but only works in one dimension
There are several derivative free methods for minimization that work in multiple dimensions, the most commonly used one is Nelder-Mead (NM)
Golden search is nice and simple but only works in one dimension
There are several derivative free methods for minimization that work in multiple dimensions, the most commonly used one is Nelder-Mead (NM)
NM works by first constructing a simplex: we evaluate the function at n+1 points in an n dimensional problem
It then manipulates the highest value point, similar to golden search
There are six operations:
There are six operations:
There are six operations:
There are six operations:
Nelder-Mead is a pain to code efficiently (i.e. don't spend the time doing it yourself) but is in the Optim.jl
package
Nelder-Mead is commonly used but slow and unreliable, no real useful convergence properties, avoid using it
We typically want to find a global extremum, here a minimum,
of our objective function f
We typically want to find a global extremum, here a minimum,
of our objective function f
x∗ is a global minimizer if f(x∗)≤f(x) for all x over the domain of the function
We typically want to find a global extremum, here a minimum,
of our objective function f
x∗ is a global minimizer if f(x∗)≤f(x) for all x over the domain of the function
Problem: most algorithms are local minimizers that find a point x∗
such that f(x∗)≤f(x) for all x∈N, where N is a neighborhood of x∗
We typically want to find a global extremum, here a minimum,
of our objective function f
x∗ is a global minimizer if f(x∗)≤f(x) for all x over the domain of the function
Problem: most algorithms are local minimizers that find a point x∗
such that f(x∗)≤f(x) for all x∈N, where N is a neighborhood of x∗
Typically analytical problems are set up to have a unique minimum so any local solver can generally find the global optimum
Lots of problems have properties that don't satisfy the typical sufficiency conditions for a unique minimum (strictly decreasing and convex), like
Lots of problems have properties that don't satisfy the typical sufficiency conditions for a unique minimum (strictly decreasing and convex), like
Lots of problems have properties that don't satisfy the typical sufficiency conditions for a unique minimum (strictly decreasing and convex), like
How do we find a local minimum?
Lots of problems have properties that don't satisfy the typical sufficiency conditions for a unique minimum (strictly decreasing and convex), like
How do we find a local minimum?
Do we need to evaluate every single point?
Optimization algorithms typically have the following set up:
Optimization algorithms typically have the following set up:
If the function is smooth, we can take advantage of that information
about the function's shape to figure out which direction to move in next
Optimization algorithms typically have the following set up:
If the function is smooth, we can take advantage of that information
about the function's shape to figure out which direction to move in next
If f is twice continuously differentiable, we can use the gradient ∇f and Hessian ∇2f to figure out if x∗ is a local minimizer
Taylor's Theorem tells us that if f is twice differentiable, then there exists a t∈(0,1) such that
f(x∗+p)=f(x∗)+∇f(x∗)Tp+12!pT∇2f(x∗+tp)p
This is an exact equality
Taylor's Theorem tells us that if f is twice differentiable, then there exists a t∈(0,1) such that
f(x∗+p)=f(x∗)+∇f(x∗)Tp+12!pT∇2f(x∗+tp)p
This is an exact equality
From here we can prove the usual necessary and sufficient conditions for a local optimum
All modern algorithms have that general set up but may go about it in different ways
All modern algorithms have that general set up but may go about it in different ways
Most modern optimization problems fall into one of two classes:
All modern algorithms have that general set up but may go about it in different ways
Most modern optimization problems fall into one of two classes:
The relationship between these two approaches has a lot of similiarities
to the relationship between the constrained problem and the dual Lagrange problem
General idea:
General idea:
How do we figure out how far to move?
General idea:
How do we figure out how far to move?
"Approximately" solve this problem to figure out the step length α min
General idea:
How do we figure out how far to move?
"Approximately" solve this problem to figure out the step length \alpha \min_{\alpha > 0} f(x_k + \alpha p_k)
We are finding the distance to move, \alpha in direction p_k that minimizes our objective f
Typically do not perform the full minimization problem since it is costly
We only try a limited number of step lengths \alpha before picking the best one and moving onto our next iterate x_{k+1}
Typically do not perform the full minimization problem since it is costly
We only try a limited number of step lengths \alpha before picking the best one and moving onto our next iterate x_{k+1}
We still haven't answered, what direction p_k do we decide to move in?
What's an obvious choice for p_k?
What's an obvious choice for p_k?
The direction that yields the steepest descent
What's an obvious choice for p_k?
The direction that yields the steepest descent
-\nabla \, f_k is the direction that makes f decrease most rapidly, k indicates we are evaluating f at iteration k
What's an obvious choice for p_k?
The direction that yields the steepest descent
-\nabla \, f_k is the direction that makes f decrease most rapidly, k indicates we are evaluating f at iteration k
We can verify this is the direction of steepest descent by referring to Taylor's theorem
We can verify this is the direction of steepest descent by referring to Taylor's theorem
For any direction p and step length \alpha, we have that f(x_k + \alpha p) = f(x_k) + \alpha\,p^T\,\nabla\,f_k + \frac{1}{2!}\,\alpha^2p^T\,\nabla^2\,f(x_k+tp)\,p
We can verify this is the direction of steepest descent by referring to Taylor's theorem
For any direction p and step length \alpha, we have that f(x_k + \alpha p) = f(x_k) + \alpha\,p^T\,\nabla\,f_k + \frac{1}{2!}\,\alpha^2p^T\,\nabla^2\,f(x_k+tp)\,p
The rate of change in f along p at x_k (\alpha = 0) is p^T \, \nabla\,f_k
The the unit vector of quickest descent solves \min_p p^T\,\nabla\,f_k \,\,\,\,\, \text{subject to: }||p|| = 1
The the unit vector of quickest descent solves \min_p p^T\,\nabla\,f_k \,\,\,\,\, \text{subject to: }||p|| = 1
Re-express the objective as \min_{\theta,p} ||p||\,||\nabla\,f_k||cos\,\theta, where \theta is the angle between p and \nabla\,f_k
The the unit vector of quickest descent solves \min_p p^T\,\nabla\,f_k \,\,\,\,\, \text{subject to: }||p|| = 1
Re-express the objective as \min_{\theta,p} ||p||\,||\nabla\,f_k||cos\,\theta, where \theta is the angle between p and \nabla\,f_k
The minimum is attained when cos\,\theta = -1 and p = -\frac{\nabla\,f_k}{||\nabla\,f_k||}, so the direction of steepest descent is simply -\nabla\,f_k
The steepest descent method searches along this direction at every iteration k
It may select the step length \alpha_k in several different ways
A benefit of the algorithm is that we only require the gradient of the function, and no Hessian
However it can be very slow
We can always use search directions other than the steepest descent
We can always use search directions other than the steepest descent
Any descent direction, i.e. one that is within 45^\circ of -\nabla\,f_k,
is guaranteed to produce a decrease in f as long as the step size is sufficiently small
We can actually verify this with Taylor's theorem
We can actually verify this with Taylor's theorem
f(x_k + \epsilon p_k) = f(x_k) + \epsilon\,p_k^T\,\nabla\,f_k + O(\epsilon^2)
We can actually verify this with Taylor's theorem
f(x_k + \epsilon p_k) = f(x_k) + \epsilon\,p_k^T\,\nabla\,f_k + O(\epsilon^2)
If p_k is in a descending direction, \theta_k will be of an angle such that cos\,\theta_k < 0
This gives us
We can actually verify this with Taylor's theorem
f(x_k + \epsilon p_k) = f(x_k) + \epsilon\,p_k^T\,\nabla\,f_k + O(\epsilon^2)
If p_k is in a descending direction, \theta_k will be of an angle such that cos\,\theta_k < 0
This gives us
p_k^T\,\nabla f_k = ||p_k||\,||\nabla\,f_k||cos\,\theta_k < 0
We can actually verify this with Taylor's theorem
f(x_k + \epsilon p_k) = f(x_k) + \epsilon\,p_k^T\,\nabla\,f_k + O(\epsilon^2)
If p_k is in a descending direction, \theta_k will be of an angle such that cos\,\theta_k < 0
This gives us
p_k^T\,\nabla f_k = ||p_k||\,||\nabla\,f_k||cos\,\theta_k < 0
Therefore f(x_k + \epsilon p_k) < f(x_k) for positive but sufficiently small \epsilon
We can actually verify this with Taylor's theorem
f(x_k + \epsilon p_k) = f(x_k) + \epsilon\,p_k^T\,\nabla\,f_k + O(\epsilon^2)
If p_k is in a descending direction, \theta_k will be of an angle such that cos\,\theta_k < 0
This gives us
p_k^T\,\nabla f_k = ||p_k||\,||\nabla\,f_k||cos\,\theta_k < 0
Therefore f(x_k + \epsilon p_k) < f(x_k) for positive but sufficiently small \epsilon
Is -\nabla\,f_k always the best search direction?
The most important search direction is not steepest descent but Newton's direction
The most important search direction is not steepest descent but Newton's direction
Newton's direction comes out of the second order Taylor series approximation to f(x_k + p) f(x_k + p) \approx f_k + p^T\,\nabla\,f_k + \frac{1}{2!}\,p^T\,\nabla^2f_k\,p Define this as \mathbf{m_k(p)}
The most important search direction is not steepest descent but Newton's direction
Newton's direction comes out of the second order Taylor series approximation to f(x_k + p) f(x_k + p) \approx f_k + p^T\,\nabla\,f_k + \frac{1}{2!}\,p^T\,\nabla^2f_k\,p Define this as \mathbf{m_k(p)}
We find the Newton direction by selecting the vector p that minimizes f(x_k + p)
The most important search direction is not steepest descent but Newton's direction
Newton's direction comes out of the second order Taylor series approximation to f(x_k + p) f(x_k + p) \approx f_k + p^T\,\nabla\,f_k + \frac{1}{2!}\,p^T\,\nabla^2f_k\,p Define this as \mathbf{m_k(p)}
We find the Newton direction by selecting the vector p that minimizes f(x_k + p)
This ends up being p^N_k = -\frac{\nabla f_k}{\nabla^2 f_k}
This approximation to the function we are trying to solve has error of O(||p||^3),
so if p is small, the quadratic approximation is very accurate
This approximation to the function we are trying to solve has error of O(||p||^3),
so if p is small, the quadratic approximation is very accurate
Drawback: requires explicit computation of the Hessian, \nabla^2 \, f(x)
Trust region methods construct an approximating model, m_k
whose behavior near the current iterate x_k is close to that of the actual function f
Trust region methods construct an approximating model, m_k
whose behavior near the current iterate x_k is close to that of the actual function f
We then search for a minimizer of m_k
Trust region methods construct an approximating model, m_k
whose behavior near the current iterate x_k is close to that of the actual function f
We then search for a minimizer of m_k
Issue: m_k may not represent f well when far away from the current iterate x_k
Trust region methods construct an approximating model, m_k
whose behavior near the current iterate x_k is close to that of the actual function f
We then search for a minimizer of m_k
Issue: m_k may not represent f well when far away from the current iterate x_k
Solution: Restrict the search for a minimizer to be within some region of x_k, called a trust region
Trust region problems can be formulated as \min_p m_k(x_k + p) where
Typically the approximating model m_k is a quadratic function (i.e. a second-order Taylor approximation) m_k(x_k + p) = f_k + p^T\,\nabla\,f_k + \frac{1}{2!}\,p^T\,B_k\,p where B_k is the Hessian or an approximation to the Hessian
Whats the fundamental difference between line search and trust region?
Whats the fundamental difference between line search and trust region?
Line search first picks a direction then searches along that direction for the optimal step length
Trust region first defines our step length via the trust region radius, then searches for the optimal direction
There is a special case of the trust region where if we set B_k, the approximate Hessian, to zero, the solution to the problem is p_k = -\frac{\Delta_k \,\nabla \,f_k}{||\nabla\,f_k||}
This is just the steepest descent solution for the line search problem
The scaling of a problem matters for optimization performance
The scaling of a problem matters for optimization performance
A problem is poorly scaled if changes to x in a certain direction
produce much bigger changes in f than changes to in x in another direction
The scaling of a problem matters for optimization performance
A problem is poorly scaled if changes to x in a certain direction
produce much bigger changes in f than changes to in x in another direction
Ex: f(x) = 10^9 x_1^2 + x_2^2 is poorly scaled
Ex: f(x) = 10^9 x_1^2 + x_2^2 is poorly scaled
This happens when things change at different rates:
Ex: f(x) = 10^9 x_1^2 + x_2^2 is poorly scaled
This happens when things change at different rates:
How do we solve this issue?
Ex: f(x) = 10^9 x_1^2 + x_2^2 is poorly scaled
This happens when things change at different rates:
How do we solve this issue?
Rescale the problem: put them in units that are
generally within an order of magnitude of 1
How do we solve constrained optimization problems?
How do we solve constrained optimization problems?
Typically as a variant of unconstrained optimization techniques
How do we solve constrained optimization problems?
Typically as a variant of unconstrained optimization techniques
We will discuss three types of constrained optimization algorithms
These are the algorithms in workhorse commercial solvers: KNITRO
These are the algorithms in workhorse commercial solvers: KNITRO
These are the algorithms in workhorse commercial solvers: fmincon/MATLAB
Suppose we wish to minimize some function subject to equality constraints (easily generalizes to inequality) \min_x f(x) \,\,\, \text{subject to:} \,\, c_i(x) = 0
Suppose we wish to minimize some function subject to equality constraints (easily generalizes to inequality) \min_x f(x) \,\,\, \text{subject to:} \,\, c_i(x) = 0
How does an algorithm know to not violate the constraint?
Suppose we wish to minimize some function subject to equality constraints (easily generalizes to inequality) \min_x f(x) \,\,\, \text{subject to:} \,\, c_i(x) = 0
How does an algorithm know to not violate the constraint?
One way is to introduce a penalty function into our objective and remove the constraint: Q(x;\mu) = f(x) + \frac{\mu}{2} \sum_i c_i^2(x)
where \mu is the penalty parameter
Q(x;\mu) = f(x) + \frac{\mu}{2} \sum_i c_i^2(x)
Q(x;\mu) = f(x) + \frac{\mu}{2} \sum_i c_i^2(x)
The second term increases the value of the function, bigger \mu \rightarrow bigger penalty from violating the constraint
Q(x;\mu) = f(x) + \frac{\mu}{2} \sum_i c_i^2(x)
The second term increases the value of the function, bigger \mu \rightarrow bigger penalty from violating the constraint
The penalty terms are smooth \rightarrow use unconstrained optimization techniques
to solve the problem by searching for iterates of x_k
Q(x;\mu) = f(x) + \frac{\mu}{2} \sum_i c_i^2(x)
The second term increases the value of the function, bigger \mu \rightarrow bigger penalty from violating the constraint
The penalty terms are smooth \rightarrow use unconstrained optimization techniques
to solve the problem by searching for iterates of x_k
Also generally iterate on sequences of \mu_k \rightarrow \infty as k \rightarrow \infty, to require satisfying the constraints as we close in
Q(x;\mu) = f(x) + \frac{\mu}{2} \sum_i c_i^2(x)
The second term increases the value of the function, bigger \mu \rightarrow bigger penalty from violating the constraint
The penalty terms are smooth \rightarrow use unconstrained optimization techniques
to solve the problem by searching for iterates of x_k
Also generally iterate on sequences of \mu_k \rightarrow \infty as k \rightarrow \infty, to require satisfying the constraints as we close in
There are also augmented Lagrangian methods that take the quadratic penalty method and add in explicit estimates of Lagrange multipliers to help force binding constraints to bind precisely
Example: \min x_1 + x_2 \,\,\,\,\,\text{ subject to: } \,\,\, x_1^2 + x_2^2 - 2 = 0
Example: \min x_1 + x_2 \,\,\,\,\,\text{ subject to: } \,\,\, x_1^2 + x_2^2 - 2 = 0
Solution is pretty easy to show to be (-1, -1)
Example: \min x_1 + x_2 \,\,\,\,\,\text{ subject to: } \,\,\, x_1^2 + x_2^2 - 2 = 0
Solution is pretty easy to show to be (-1, -1)
The penalty method function Q(x_1, x_2; \mu) is Q(x_1, x_2; \mu) = x_1 + x_2 + \frac{\mu}{2} (x_1^2 + x_2^2 - 2)^2
Example: \min x_1 + x_2 \,\,\,\,\,\text{ subject to: } \,\,\, x_1^2 + x_2^2 - 2 = 0
Solution is pretty easy to show to be (-1, -1)
The penalty method function Q(x_1, x_2; \mu) is Q(x_1, x_2; \mu) = x_1 + x_2 + \frac{\mu}{2} (x_1^2 + x_2^2 - 2)^2
Let's ramp up \mu and see what happens to how the function looks
\mu = 1, solution is around (-1.1, -1.1)
\mu = 10, solution is very close to (-1, -1), can easily see trough, and rapid value increase outside x_1^2 + x_2^2 = 2
Active set methods encapsulate sequential quadratic programming (SQP) methods
Active set methods encapsulate sequential quadratic programming (SQP) methods
Main idea:
Active set methods encapsulate sequential quadratic programming (SQP) methods
Main idea:
Tthe Lagrangian is L(x,\lambda) = f(x) - \lambda^T\,c(x)
Active set methods encapsulate sequential quadratic programming (SQP) methods
Main idea:
Tthe Lagrangian is L(x,\lambda) = f(x) - \lambda^T\,c(x)
Denote A(x)^T as the Jacobian of the constraints A(x)^T = [\nabla\,c_1(x),...,\nabla\,c_m(x)]
The first-order conditions F(x,\lambda) can be written as, \begin{gather} \nabla\,f(x) - A(x)^T\,\lambda = 0 \notag\\ c(x) = 0 \notag \end{gather}
Any solution to the equality constrained problem, where A(x^*) has full rank also satisfies the first-order necessary conditions
The first-order conditions F(x,\lambda) can be written as, \begin{gather} \nabla\,f(x) - A(x)^T\,\lambda = 0 \notag\\ c(x) = 0 \notag \end{gather}
Any solution to the equality constrained problem, where A(x^*) has full rank also satisfies the first-order necessary conditions
Active set methods use Newton's method to find the solution (x^*, \lambda^*) of F(x,\lambda)
Issue: if we have many constraints, keeping track of all of them can be expensive
Issue: if we have many constraints, keeping track of all of them can be expensive
Main idea: recognize that if an inequality constraint is not binding, or active, then it has no influence on the solution
\rightarrow in the iteration procedure we can effectively ignore it
Issue: if we have many constraints, keeping track of all of them can be expensive
Main idea: recognize that if an inequality constraint is not binding, or active, then it has no influence on the solution
\rightarrow in the iteration procedure we can effectively ignore it
Active set methods find ways to reduce the complexity of the optimization routine
by selectively ignoring constraints that are not active (i.e. non-positive Lagrange multipliers) or close to being active
Interior point methods are also called barrier methods
Interior point methods are also called barrier methods
These are typically used for inequality constrained problems
Interior point methods are also called barrier methods
These are typically used for inequality constrained problems
The name interior point comes from the algorithm traversing the domain along the interior of the inequality constraints
Interior point methods are also called barrier methods
These are typically used for inequality constrained problems
The name interior point comes from the algorithm traversing the domain along the interior of the inequality constraints
Issue: how do we ensure we are on the interior of the feasible set?
Interior point methods are also called barrier methods
These are typically used for inequality constrained problems
The name interior point comes from the algorithm traversing the domain along the interior of the inequality constraints
Issue: how do we ensure we are on the interior of the feasible set?
Main idea: impose a barrier to stop the solver from letting a constraint bind
Consider the following constrained optimization problem \begin{gather} \min_{x} f(x) \notag\\ \text{subject to: } c_E(x) = 0, c_I(x) \geq 0 \end{gather}
Consider the following constrained optimization problem \begin{gather} \min_{x} f(x) \notag\\ \text{subject to: } c_E(x) = 0, c_I(x) \geq 0 \end{gather}
Reformulate this problem as \begin{gather} \min_{x,s} f(x) \notag\\ \text{subject to: } c_E(x) = 0, c_I(x) - s = 0, s \geq 0 \end{gather}
where s is a vector of slack variables for the constraints
Final step: introduce a barrier function to eliminate the inequality constraint, \begin{gather} \min_{x,s} f(x) - \mu \sum_{i=1}^m log(s_i) \notag\\ \text{subject to: } c_E(x) = 0, c_I(x) - s = 0 \end{gather}
where \mu is a positive barrier parameter
Final step: introduce a barrier function to eliminate the inequality constraint, \begin{gather} \min_{x,s} f(x) - \mu \sum_{i=1}^m log(s_i) \notag\\ \text{subject to: } c_E(x) = 0, c_I(x) - s = 0 \end{gather}
where \mu is a positive barrier parameter
The barrier function prevents the components of s from approaching
zero by imposing a logarithmic barrier \rightarrow it maintains slack in the constraints
Final step: introduce a barrier function to eliminate the inequality constraint, \begin{gather} \min_{x,s} f(x) - \mu \sum_{i=1}^m log(s_i) \notag\\ \text{subject to: } c_E(x) = 0, c_I(x) - s = 0 \end{gather}
where \mu is a positive barrier parameter
The barrier function prevents the components of s from approaching
zero by imposing a logarithmic barrier \rightarrow it maintains slack in the constraints
Interior point methods solve a sequence of barrier problems until \{\mu_k\} converges to zero
Final step: introduce a barrier function to eliminate the inequality constraint, \begin{gather} \min_{x,s} f(x) - \mu \sum_{i=1}^m log(s_i) \notag\\ \text{subject to: } c_E(x) = 0, c_I(x) - s = 0 \end{gather}
where \mu is a positive barrier parameter
The barrier function prevents the components of s from approaching
zero by imposing a logarithmic barrier \rightarrow it maintains slack in the constraints
Interior point methods solve a sequence of barrier problems until \{\mu_k\} converges to zero
The solution to the barrier problem converges to that of the original problem
Plug in your guess, let the solver go, and you're done right?
Plug in your guess, let the solver go, and you're done right?
Plug in your guess, let the solver go, and you're done right?
These algorithms are not guaranteed to always find even a local solution, you need to test and make sure you are converging correctly
Exitflags tell you why the solver stopped, exit flags of 0 or -10X are generally good, anything else is bad
-10X can indicate bad scaling, ill-conditioning, etc
Optimization is approximately 53% art
Optimization is approximately 53% art
Not all algorithms are suited for every problem \rightarrow it is useful to check how different algorithms perform
Optimization is approximately 53% art
Not all algorithms are suited for every problem \rightarrow it is useful to check how different algorithms perform
Interior-point is usually the default in constrained optimization solvers (low memory usage, fast), but try other algorithms and see if the solution generally remains the same
Two main tolerances in optimization:
ftol
is the tolerance for the change in the function value (absolute and relative)xtol
is the tolerance for the change in the input values (absolute and relative)Two main tolerances in optimization:
ftol
is the tolerance for the change in the function value (absolute and relative)xtol
is the tolerance for the change in the input values (absolute and relative)What is a suitable tolerance?
Two main tolerances in optimization:
ftol
is the tolerance for the change in the function value (absolute and relative)xtol
is the tolerance for the change in the input values (absolute and relative)What is a suitable tolerance?
It depends
Two main tolerances in optimization:
ftol
is the tolerance for the change in the function value (absolute and relative)xtol
is the tolerance for the change in the input values (absolute and relative)What is a suitable tolerance?
It depends
Explore sensitivity to tolerance, typically pick a conservative (small) number
1e-6
May be a substantial tradeoff between accuracy of your solution and speed
May be a substantial tradeoff between accuracy of your solution and speed
Common bad practice is to pick a larger tolerance (e.g. 1e-3
) so the problem "works" (e.g. so your big MLE converges)
May be a substantial tradeoff between accuracy of your solution and speed
Common bad practice is to pick a larger tolerance (e.g. 1e-3
) so the problem "works" (e.g. so your big MLE converges)
Issue is that 1e-3 might be pretty big for your problem
if you haven't checked that your solution is not sensitive to the tolerance
Initial guesses matter
Initial guesses matter
Good ones can improve performance
Initial guesses matter
Good ones can improve performance
Bad ones can give you terrible performance, or wrong answers if your problem isn't perfect
Necessary things to do:
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |