Processing math: 0%
+ - 0:00:00
Notes for current slide
Notes for next slide

Lecture 6

Projection methods

Ivan Rudik

AEM 7130

1 / 87

Roadmap

  1. What is projection
  2. How we approximate functions
2 / 87

Our basic dynamic model

An arbitrary infinite horizon problem can be represented using a Bellman equation V(St)=max

3 / 87

Our basic dynamic model

An arbitrary infinite horizon problem can be represented using a Bellman equation V(S_t) = \max_{q_t} u(q_t) + \beta\,V(S_{t+1}(q_t))

We can start from any arbitrary initial state vector and simulate the optimal policy paths in deterministic or stochastic settings

3 / 87

Our basic dynamic model

An arbitrary infinite horizon problem can be represented using a Bellman equation V(S_t) = \max_{q_t} u(q_t) + \beta\,V(S_{t+1}(q_t))

We can start from any arbitrary initial state vector and simulate the optimal policy paths in deterministic or stochastic settings

We know that this problem has a fixed point (solution) V^* from previous lectures, but how do we approximate V^*?

3 / 87

Our basic dynamic model

An arbitrary infinite horizon problem can be represented using a Bellman equation V(S_t) = \max_{q_t} u(q_t) + \beta\,V(S_{t+1}(q_t))

We can start from any arbitrary initial state vector and simulate the optimal policy paths in deterministic or stochastic settings

We know that this problem has a fixed point (solution) V^* from previous lectures, but how do we approximate V^*?

One way to do this is via projection

3 / 87

Our basic dynamic model

An arbitrary infinite horizon problem can be represented using a Bellman equation V(S_t) = \max_{q_t} u(q_t) + \beta\,V(S_{t+1}(q_t))

We can start from any arbitrary initial state vector and simulate the optimal policy paths in deterministic or stochastic settings

We know that this problem has a fixed point (solution) V^* from previous lectures, but how do we approximate V^*?

One way to do this is via projection

Or perturbation, or discretization, or some analytic approaches for specific models

3 / 87

Our basic dynamic model

An arbitrary infinite horizon problem can be represented using a Bellman equation V(S_t) = \max_{q_t} u(q_t) + \beta\,V(S_{t+1}(q_t))

We can start from any arbitrary initial state vector and simulate the optimal policy paths in deterministic or stochastic settings

We know that this problem has a fixed point (solution) V^* from previous lectures, but how do we approximate V^*?

One way to do this is via projection

Or perturbation, or discretization, or some analytic approaches for specific models

We'll focus on value functions here, but we can approximate policy functions as well

3 / 87

Projection methods

Main idea: build some function \hat{V} indexed by coefficients that approximately solves the Bellman

4 / 87

Projection methods

Main idea: build some function \hat{V} indexed by coefficients that approximately solves the Bellman

What do I mean by approximately?

4 / 87

Projection methods

Main idea: build some function \hat{V} indexed by coefficients that approximately solves the Bellman

What do I mean by approximately?

The coefficients of \hat{V} are selected to minimize some residual function
that tells us how far away our solution is from solving the Bellman

4 / 87

Projection methods

Main idea: build some function \hat{V} indexed by coefficients that approximately solves the Bellman

What do I mean by approximately?

The coefficients of \hat{V} are selected to minimize some residual function
that tells us how far away our solution is from solving the Bellman

Rearrange the Bellman equation and define a new functional H that maps the problem into a more general framework H(V) = V(S_t) - \max_{q_t} u(q_t) + \beta\,V(S_{t+1}(q_t))

4 / 87

Projection methods

Main idea: build some function \hat{V} indexed by coefficients that approximately solves the Bellman

What do I mean by approximately?

The coefficients of \hat{V} are selected to minimize some residual function
that tells us how far away our solution is from solving the Bellman

Rearrange the Bellman equation and define a new functional H that maps the problem into a more general framework H(V) = V(S_t) - \max_{q_t} u(q_t) + \beta\,V(S_{t+1}(q_t))

We can find some function V that solves H(V) = 0

4 / 87

Projection methods

How do we do this?

5 / 87

Projection methods

How do we do this?

We solve this by specifying some linear combination of basis functions \Psi_i(\mathbf{S}) V^j(\mathbf{S}|\theta) = \sum_{i=0}^j \theta_i \Psi_i(\mathbf{S}) with coefficients \theta_0,...,\theta_j

5 / 87

Projection methods

We then define a residual R(\mathbf{S}|\theta) = H(V^j(\mathbf{S}|\theta)) and select the coefficient values to minimize the residual, given some measure of distance

6 / 87

Projection methods

We then define a residual R(\mathbf{S}|\theta) = H(V^j(\mathbf{S}|\theta)) and select the coefficient values to minimize the residual, given some measure of distance

This step of selecting the coefficients is called projecting H against our basis

6 / 87

Projection methods

We then define a residual R(\mathbf{S}|\theta) = H(V^j(\mathbf{S}|\theta)) and select the coefficient values to minimize the residual, given some measure of distance

This step of selecting the coefficients is called projecting H against our basis

We still have some choices to make:

What basis do we select?

How do we project (select the coefficients)?

6 / 87

Projection methods

At first this may be a pretty confusing concept

7 / 87

Projection methods

At first this may be a pretty confusing concept

Simple example to get intuition?

7 / 87

Projection methods

At first this may be a pretty confusing concept

Simple example to get intuition?

Ordinary least squares linear regression

7 / 87

Projection methods

At first this may be a pretty confusing concept

Simple example to get intuition?

Ordinary least squares linear regression

We can think of the problem as searching for some unknown conditional expectation E[Y|X], given outcome variable Y and regressors X

7 / 87

Projection methods

We don't know the true functional form, but we can approximate it using the first two monomials on X: 1 and X E[Y|X] \approx \theta_0 + \theta_1 X

8 / 87

Projection methods

We don't know the true functional form, but we can approximate it using the first two monomials on X: 1 and X E[Y|X] \approx \theta_0 + \theta_1 X

These are the first two elements of the monomial basis

8 / 87

Projection methods

We don't know the true functional form, but we can approximate it using the first two monomials on X: 1 and X E[Y|X] \approx \theta_0 + \theta_1 X

These are the first two elements of the monomial basis

One residual function is then: R(Y,X|\theta_0,\theta_1) = abs(Y-\theta_0-\theta_1 X)

8 / 87

Projection methods

We don't know the true functional form, but we can approximate it using the first two monomials on X: 1 and X E[Y|X] \approx \theta_0 + \theta_1 X

These are the first two elements of the monomial basis

One residual function is then: R(Y,X|\theta_0,\theta_1) = abs(Y-\theta_0-\theta_1 X)

In econometrics we often minimize the square of this residual, which is just OLS

8 / 87

Projection methods

We don't know the true functional form, but we can approximate it using the first two monomials on X: 1 and X E[Y|X] \approx \theta_0 + \theta_1 X

These are the first two elements of the monomial basis

One residual function is then: R(Y,X|\theta_0,\theta_1) = abs(Y-\theta_0-\theta_1 X)

In econometrics we often minimize the square of this residual, which is just OLS

So OLS is within the class of projection methods

8 / 87

Projection methods

We don't know the true functional form, but we can approximate it using the first two monomials on X: 1 and X E[Y|X] \approx \theta_0 + \theta_1 X

These are the first two elements of the monomial basis

One residual function is then: R(Y,X|\theta_0,\theta_1) = abs(Y-\theta_0-\theta_1 X)

In econometrics we often minimize the square of this residual, which is just OLS

So OLS is within the class of projection methods

In OLS we use observed data, but in theory we use the operator H(V)

8 / 87

Projection classes are defined by metrics

Projection methods are separated into several broad classes by the type of residual we're trying to shrink to zero

9 / 87

Projection classes are defined by metrics

Projection methods are separated into several broad classes by the type of residual we're trying to shrink to zero

We need to select some metric function \rho, that determines how we project

9 / 87

Projection classes are defined by metrics

Projection methods are separated into several broad classes by the type of residual we're trying to shrink to zero

We need to select some metric function \rho, that determines how we project

\rho tells us how close our residual function is to zero over the domain of our state space

9 / 87

Example residuals given different projections

Example: The figure shows two different residuals on some capital domain of [0,\bar{k}]

The residual based on the coefficient vector \theta_1 is large for small values of capital but near-zero everywhere else

10 / 87

Example residuals given different projections

Example: The figure shows two different residuals on some capital domain of [0,\bar{k}]

The residual based on the coefficient vector \theta_1 is large for small values of capital but near-zero everywhere else

The residual based on \theta_2 has medium values just about everywhere

10 / 87

Example residuals given different projections

Example: The figure shows two different residuals on some capital domain of [0,\bar{k}]

The residual based on the coefficient vector \theta_1 is large for small values of capital but near-zero everywhere else

The residual based on \theta_2 has medium values just about everywhere

Which is closer to zero over the interval? It will depend on our selection of \rho

10 / 87

We want to use weighted residuals

We move from the plain residual to \rho because we want to set a weighted residual equal to zero (more general)

11 / 87

We want to use weighted residuals

We move from the plain residual to \rho because we want to set a weighted residual equal to zero (more general)

Suppose we have some weight functions \phi_i:\Omega \rightarrow \mathbb{R} that map from our state space to the real line

11 / 87

We want to use weighted residuals

We move from the plain residual to \rho because we want to set a weighted residual equal to zero (more general)

Suppose we have some weight functions \phi_i:\Omega \rightarrow \mathbb{R} that map from our state space to the real line

The one-dimensional metric is defined as \begin{equation*} \rho(R\cdot|\theta,0) = \begin{cases} 0 & \text{if } \int_\Omega\phi_i(\mathbf{S})R(\cdot|\theta)d\mathbf{S} = 0, i=1,...,j+1\\ 1 &\text{otherwise} \end{cases} \end{equation*}

11 / 87

We want to use weighted residuals

We move from the plain residual to \rho because we want to set a weighted residual equal to zero (more general)

Suppose we have some weight functions \phi_i:\Omega \rightarrow \mathbb{R} that map from our state space to the real line

The one-dimensional metric is defined as \begin{equation*} \rho(R\cdot|\theta,0) = \begin{cases} 0 & \text{if } \int_\Omega\phi_i(\mathbf{S})R(\cdot|\theta)d\mathbf{S} = 0, i=1,...,j+1\\ 1 &\text{otherwise} \end{cases} \end{equation*}

Where we want to solve for \theta = \text{argmin} \, \rho(R(\cdot|\theta),0)

11 / 87

We want to use weighted residuals

We can then change our problem to simply solving a system of integrals ensuring the metric is zero \int_\Omega\phi_i(\mathbf{S})R(\cdot|\theta)d\mathbf{S} = 0, i=1,...,j+1.

12 / 87

We want to use weighted residuals

We can then change our problem to simply solving a system of integrals ensuring the metric is zero \int_\Omega\phi_i(\mathbf{S})R(\cdot|\theta)d\mathbf{S} = 0, i=1,...,j+1.

We can solve this using standard rootfinding techniques

12 / 87

We want to use weighted residuals

We can then change our problem to simply solving a system of integrals ensuring the metric is zero \int_\Omega\phi_i(\mathbf{S})R(\cdot|\theta)d\mathbf{S} = 0, i=1,...,j+1.

We can solve this using standard rootfinding techniques

Big remaining question: how do we choose our j+1 weight functions?

12 / 87

We want to use weighted residuals

We can then change our problem to simply solving a system of integrals ensuring the metric is zero \int_\Omega\phi_i(\mathbf{S})R(\cdot|\theta)d\mathbf{S} = 0, i=1,...,j+1.

We can solve this using standard rootfinding techniques

Big remaining question: how do we choose our j+1 weight functions?

First lets begin with a simple example before moving into the most commonly used weight functions

12 / 87

Least squares projection

Suppose we selected the weight function to be \phi_i(\mathbf{S}) = \frac{\partial R(\mathbf{S}|\theta)}{\partial \theta_{i-1}}

13 / 87

Least squares projection

Suppose we selected the weight function to be \phi_i(\mathbf{S}) = \frac{\partial R(\mathbf{S}|\theta)}{\partial \theta_{i-1}}

Then we would be performing least squares! Why?

13 / 87

Least squares projection

Recall the objective of least squares is \min_{\mathbf{\theta}} \int R^2(\cdot|\theta)d\mathbf{S}

14 / 87

Least squares projection

Recall the objective of least squares is \min_{\mathbf{\theta}} \int R^2(\cdot|\theta)d\mathbf{S} The FOC for a minimum is \int \frac{\partial R(\mathbf{S}|\theta)}{\partial \theta_{i-1}}R(\cdot|\theta)d\mathbf{S} = 0, i = 1,...,j+1

14 / 87

Least squares projection

Recall the objective of least squares is \min_{\mathbf{\theta}} \int R^2(\cdot|\theta)d\mathbf{S} The FOC for a minimum is \int \frac{\partial R(\mathbf{S}|\theta)}{\partial \theta_{i-1}}R(\cdot|\theta)d\mathbf{S} = 0, i = 1,...,j+1

So the first order condition sets the weighted average residual to zero
where the weights are determined by the partial derivatives of the residual

14 / 87

Least squares projection

Recall the objective of least squares is \min_{\mathbf{\theta}} \int R^2(\cdot|\theta)d\mathbf{S} The FOC for a minimum is \int \frac{\partial R(\mathbf{S}|\theta)}{\partial \theta_{i-1}}R(\cdot|\theta)d\mathbf{S} = 0, i = 1,...,j+1

So the first order condition sets the weighted average residual to zero
where the weights are determined by the partial derivatives of the residual

The partial derivative weight function yields a metric function that solves the least squares problem!

14 / 87

Collocation

The most commonly used weight function gives us a methodology called collocation

15 / 87

Collocation

The most commonly used weight function gives us a methodology called collocation

Here our weight function is \phi_i(\mathbf{S}) = \delta(\mathbf{S}-\mathbf{S}_i) Where \delta is the Dirac delta function and \mathbf{S}_i are the j+1 collocation points or collocation nodes selected by the researcher

15 / 87

Collocation

The most commonly used weight function gives us a methodology called collocation

Here our weight function is \phi_i(\mathbf{S}) = \delta(\mathbf{S}-\mathbf{S}_i) Where \delta is the Dirac delta function and \mathbf{S}_i are the j+1 collocation points or collocation nodes selected by the researcher

The Dirac delta function is zero at all \mathbf{S} except at \mathbf{S} = \mathbf{S}_i

15 / 87

Collocation

The most commonly used weight function gives us a methodology called collocation

Here our weight function is \phi_i(\mathbf{S}) = \delta(\mathbf{S}-\mathbf{S}_i) Where \delta is the Dirac delta function and \mathbf{S}_i are the j+1 collocation points or collocation nodes selected by the researcher

The Dirac delta function is zero at all \mathbf{S} except at \mathbf{S} = \mathbf{S}_i

What does this weight function mean?

15 / 87

Collocation

Before we even select our coefficients, this means that the residual can only be non-zero at a finite set of points \mathbf{S_i}

16 / 87

Collocation

Before we even select our coefficients, this means that the residual can only be non-zero at a finite set of points \mathbf{S_i}

So the solution to our problem must set the residual to zero at these collocation points

16 / 87

Collocation

Before we even select our coefficients, this means that the residual can only be non-zero at a finite set of points \mathbf{S_i}

So the solution to our problem must set the residual to zero at these collocation points

Since we have a finite set of points we do not need to solve difficult integrals but only a system of equations R(\mathbf{S}_i|\theta) = 0, i=1,...,j+1

16 / 87

Rough idea of how we proceed

Collocation methods aim to approximate the value function V(S_t) with some linear combination of known, and usually orthogonal (\int f(x)g(x)w(x)=0), basis functions

17 / 87

Rough idea of how we proceed

Collocation methods aim to approximate the value function V(S_t) with some linear combination of known, and usually orthogonal (\int f(x)g(x)w(x)=0), basis functions

One example of a possible class of basis functions is the monomial basis: x, x^2, x^3,...

17 / 87

Rough idea of how we proceed

Collocation methods aim to approximate the value function V(S_t) with some linear combination of known, and usually orthogonal (\int f(x)g(x)w(x)=0), basis functions

One example of a possible class of basis functions is the monomial basis: x, x^2, x^3,...

But how do we implement collocation?

17 / 87

Rough idea of how we proceed

Recall we solve for coefficients \theta by setting the residual to be zero at all of our collocation points

18 / 87

Rough idea of how we proceed

Recall we solve for coefficients \theta by setting the residual to be zero at all of our collocation points

But the residual is a function of V, and thus will be a function of \theta, a loop seemingly impossible to escape

18 / 87

Rough idea of how we proceed

Recall we solve for coefficients \theta by setting the residual to be zero at all of our collocation points

But the residual is a function of V, and thus will be a function of \theta, a loop seemingly impossible to escape

We can solve this problem by iterating on the problem,
continually setting the residuals equal to zero, recovering new \thetas, and repeating

18 / 87

Rough idea of how we proceed

In any given iteration, we:

19 / 87

Rough idea of how we proceed

In any given iteration, we:

  1. Begin with a vector of coefficients on the basis functions (e.g. a random guess)
19 / 87

Rough idea of how we proceed

In any given iteration, we:

  1. Begin with a vector of coefficients on the basis functions (e.g. a random guess)

  2. Use a linear combination of the basis functions as an approximation to the value function on the right hand side of the Bellman

19 / 87

Rough idea of how we proceed

In any given iteration, we:

  1. Begin with a vector of coefficients on the basis functions (e.g. a random guess)

  2. Use a linear combination of the basis functions as an approximation to the value function on the right hand side of the Bellman

  3. Solve the Bellman with the approximated value function in it, at our set of collocation points

19 / 87

Rough idea of how we proceed

In any given iteration, we:

  1. Begin with a vector of coefficients on the basis functions (e.g. a random guess)

  2. Use a linear combination of the basis functions as an approximation to the value function on the right hand side of the Bellman

  3. Solve the Bellman with the approximated value function in it, at our set of collocation points

  4. Recover a set maximized continuation values at these points in our state space conditional on the previous value function approximant we used

19 / 87

Rough idea of how we proceed

In any given iteration, we:

  1. Begin with a vector of coefficients on the basis functions (e.g. a random guess)

  2. Use a linear combination of the basis functions as an approximation to the value function on the right hand side of the Bellman

  3. Solve the Bellman with the approximated value function in it, at our set of collocation points

  4. Recover a set maximized continuation values at these points in our state space conditional on the previous value function approximant we used

  5. Use these new maximized values to obtain updated coefficients solving the system of linear equations, and repeat the process until we have "converged"

19 / 87

Still have some questions

By the contraction mapping theorem this is guaranteed to converge from any arbitrary initial set of coefficients! (conditional on satisfying the assumptions and there being no numerical error..)

20 / 87

Still have some questions

By the contraction mapping theorem this is guaranteed to converge from any arbitrary initial set of coefficients! (conditional on satisfying the assumptions and there being no numerical error..)

This still leaves several questions unanswered

20 / 87

Still have some questions

By the contraction mapping theorem this is guaranteed to converge from any arbitrary initial set of coefficients! (conditional on satisfying the assumptions and there being no numerical error..)

This still leaves several questions unanswered

  • Where in the state space do we place the collocation points?
20 / 87

Still have some questions

By the contraction mapping theorem this is guaranteed to converge from any arbitrary initial set of coefficients! (conditional on satisfying the assumptions and there being no numerical error..)

This still leaves several questions unanswered

  • Where in the state space do we place the collocation points?

  • What basis functions do we use?

20 / 87

Still have some questions

By the contraction mapping theorem this is guaranteed to converge from any arbitrary initial set of coefficients! (conditional on satisfying the assumptions and there being no numerical error..)

This still leaves several questions unanswered

  • Where in the state space do we place the collocation points?

  • What basis functions do we use?

Do your answers to the previous two questions really matter?

20 / 87

Still have some questions

By the contraction mapping theorem this is guaranteed to converge from any arbitrary initial set of coefficients! (conditional on satisfying the assumptions and there being no numerical error..)

This still leaves several questions unanswered

  • Where in the state space do we place the collocation points?

  • What basis functions do we use?

Do your answers to the previous two questions really matter?

Yes they are crucial

20 / 87

Still have some questions

Schemes exist to generate high quality approximations, and to obtain the approximation at low computational cost

21 / 87

Still have some questions

Schemes exist to generate high quality approximations, and to obtain the approximation at low computational cost

Once we have recovered an adequate approximation to the value function, we have effectively solved the entire problem!

21 / 87

Still have some questions

Schemes exist to generate high quality approximations, and to obtain the approximation at low computational cost

Once we have recovered an adequate approximation to the value function, we have effectively solved the entire problem!

We know how the policymaker's expected discounted stream of future payoffs changes as we move through the state space

21 / 87

Still have some questions

Schemes exist to generate high quality approximations, and to obtain the approximation at low computational cost

Once we have recovered an adequate approximation to the value function, we have effectively solved the entire problem!

We know how the policymaker's expected discounted stream of future payoffs changes as we move through the state space

We can solve the Bellman at some initial state and know that solving the Bellman will yield us optimal policies

21 / 87

Still have some questions

Schemes exist to generate high quality approximations, and to obtain the approximation at low computational cost

Once we have recovered an adequate approximation to the value function, we have effectively solved the entire problem!

We know how the policymaker's expected discounted stream of future payoffs changes as we move through the state space

We can solve the Bellman at some initial state and know that solving the Bellman will yield us optimal policies

Therefore we can simulate anything we want and recover optimal policy functions
given many different sets of initial conditions or realizations of random variables

21 / 87

Interpolation

What points in our state space do we select to be collocation points?

22 / 87

Interpolation

What points in our state space do we select to be collocation points?

We often have continuous states in economics (capital, temperature, oil reserves, technology shocks, etc.), so we must have some way to reduce the infinity of points in our state space into something more manageable

22 / 87

Interpolation

What points in our state space do we select to be collocation points?

We often have continuous states in economics (capital, temperature, oil reserves, technology shocks, etc.), so we must have some way to reduce the infinity of points in our state space into something more manageable

We do so by selecting a specific finite number of points in our state space and use them to construct a collocation grid that spans the domain of our problem

22 / 87

Interpolation

Using our knowledge of how the value function behaves at the limited set of points on our grid, we can interpolate our value function approximant at all points off the grid points, but within the domain of our grid

23 / 87

Interpolation

Using our knowledge of how the value function behaves at the limited set of points on our grid, we can interpolate our value function approximant at all points off the grid points, but within the domain of our grid

Note: The value function approximant is not very good outside the grid's domain since that would mean extrapolating beyond whatever information we have gained from analyzing our value function on the grid

23 / 87

Basis functions

Let V be the value function we wish to approximate with some \hat{V}

24 / 87

Basis functions

Let V be the value function we wish to approximate with some \hat{V}

\hat{V} is constructed as a linear combination of n linearly independent (i.e. orthogonal) basis functions \hat{V}(x)= \sum_{j=1}^n c_j \psi_j(x)

24 / 87

Basis functions

Let V be the value function we wish to approximate with some \hat{V}

\hat{V} is constructed as a linear combination of n linearly independent (i.e. orthogonal) basis functions \hat{V}(x)= \sum_{j=1}^n c_j \psi_j(x)

Each \psi_j(x) is a basis function, and the coefficients c_j determine how they are combined
at some point \bar{x} to yield our approximation \hat{V}(\bar{x}) to V(\bar{x})

24 / 87

Basis functions

The number of basis functions we select, n, is the degree of interpolation

25 / 87

Basis functions

The number of basis functions we select, n, is the degree of interpolation

In order to recover n coefficients, we need at least n equations that must be satisfied at a solution to the problem

25 / 87

Basis functions

The number of basis functions we select, n, is the degree of interpolation

In order to recover n coefficients, we need at least n equations that must be satisfied at a solution to the problem

If we have precisely n equations, we are just solving a simple system of linear equations: we have a perfectly identified system and are solving a collocation problem

25 / 87

Basis functions

The number of basis functions we select, n, is the degree of interpolation

In order to recover n coefficients, we need at least n equations that must be satisfied at a solution to the problem

If we have precisely n equations, we are just solving a simple system of linear equations: we have a perfectly identified system and are solving a collocation problem

This is what happens we select our number of grid points in the state space
to be equal to the number of coefficients (which induces a Dirac delta weighting function)

25 / 87

Basis functions

Solve a system of equations, linear in c_j that equates the function approximant at the grid points to the recovered values \Psi \mathbf{c} = \mathbf{y} where \Psi is the matrix of basis functions, c is a vector of coefficients, and y is a vector of the recovered values

26 / 87

Basis functions

Solve a system of equations, linear in c_j that equates the function approximant at the grid points to the recovered values \Psi \mathbf{c} = \mathbf{y} where \Psi is the matrix of basis functions, c is a vector of coefficients, and y is a vector of the recovered values

We can recover c by left dividing by \Psi which yields \mathbf{c} = \Psi^{-1}\mathbf{y}

26 / 87

Basis functions

Solve a system of equations, linear in c_j that equates the function approximant at the grid points to the recovered values \Psi \mathbf{c} = \mathbf{y} where \Psi is the matrix of basis functions, c is a vector of coefficients, and y is a vector of the recovered values

We can recover c by left dividing by \Psi which yields \mathbf{c} = \Psi^{-1}\mathbf{y}

If we have more equations, or grid points, than coefficients, then we can just use OLS to solve for the coefficients by minimizing the sum of squared errors

26 / 87

Basis functions

Solve a system of equations, linear in c_j that equates the function approximant at the grid points to the recovered values \Psi \mathbf{c} = \mathbf{y} where \Psi is the matrix of basis functions, c is a vector of coefficients, and y is a vector of the recovered values

We can recover c by left dividing by \Psi which yields \mathbf{c} = \Psi^{-1}\mathbf{y}

If we have more equations, or grid points, than coefficients, then we can just use OLS to solve for the coefficients by minimizing the sum of squared errors

\mathbf{c} = (\Psi'\Psi)^{-1}\Psi'\mathbf{y}

26 / 87

(Pseudo-)spectral methods

Spectral methods apply all of our basis functions to the entire domain of our grid: they are global

27 / 87

(Pseudo-)spectral methods

Spectral methods apply all of our basis functions to the entire domain of our grid: they are global

When using spectral methods we virtually always use polynomials

27 / 87

(Pseudo-)spectral methods

Spectral methods apply all of our basis functions to the entire domain of our grid: they are global

When using spectral methods we virtually always use polynomials

Why?

27 / 87

(Pseudo-)spectral methods

Spectral methods apply all of our basis functions to the entire domain of our grid: they are global

When using spectral methods we virtually always use polynomials

Why?

The Stone-Weierstrass Theorem which states (for one dimension)

Suppose f is a continuous real-valued function defined on the interval [a,b].
For every \epsilon > 0, \,\, \exists a polynomial p(x) such that for all x \in [a,b] we have ||f(x) - p(x)||_{sup} \leq \epsilon

27 / 87

(Pseudo-)spectral methods

Spectral methods apply all of our basis functions to the entire domain of our grid: they are global

When using spectral methods we virtually always use polynomials

Why?

The Stone-Weierstrass Theorem which states (for one dimension)

Suppose f is a continuous real-valued function defined on the interval [a,b].
For every \epsilon > 0, \,\, \exists a polynomial p(x) such that for all x \in [a,b] we have ||f(x) - p(x)||_{sup} \leq \epsilon

What does the SW theorem say in words?

27 / 87

(Pseudo-)spectral methods

For any continuous function f(x), we can approximate it arbitrarily well with some polynomial p(x), as long as f(x) is continuous

28 / 87

(Pseudo-)spectral methods

For any continuous function f(x), we can approximate it arbitrarily well with some polynomial p(x), as long as f(x) is continuous

This means the function can even have kinks

28 / 87

(Pseudo-)spectral methods

For any continuous function f(x), we can approximate it arbitrarily well with some polynomial p(x), as long as f(x) is continuous

This means the function can even have kinks

Unfortunately we do not have infinite computational power so solving kinked problems with spectral methods is not advised

28 / 87

(Pseudo-)spectral methods

For any continuous function f(x), we can approximate it arbitrarily well with some polynomial p(x), as long as f(x) is continuous

This means the function can even have kinks

Unfortunately we do not have infinite computational power so solving kinked problems with spectral methods is not advised

Note that the SW theorem does not say what kind of polynomial can approximate f arbitrarily well, just that some polynomial exists

28 / 87

Basis choice

What would be your first choice of basis?

29 / 87

Basis choice

What would be your first choice of basis?

Logical choice: the monomial basis: 1, x, x^2,...

29 / 87

Basis choice

What would be your first choice of basis?

Logical choice: the monomial basis: 1, x, x^2,...

It is simple, and SW tells us that we can uniformly approximate any continuous function on a closed interval using them

29 / 87

Basis choice

What would be your first choice of basis?

Logical choice: the monomial basis: 1, x, x^2,...

It is simple, and SW tells us that we can uniformly approximate any continuous function on a closed interval using them

Practice

code up a function project_monomial(f, n, lb, ub) that takes in some function f, degree of approximation n, lower bound lb and upper bound ub, and constructs a monomial approximation on an evenly spaced grid via collocation

29 / 87

Basis choice

What would be your first choice of basis?

Logical choice: the monomial basis: 1, x, x^2,...

It is simple, and SW tells us that we can uniformly approximate any continuous function on a closed interval using them

Practice

code up a function project_monomial(f, n, lb, ub) that takes in some function f, degree of approximation n, lower bound lb and upper bound ub, and constructs a monomial approximation on an evenly spaced grid via collocation

We will be plotting stuff, see http://docs.juliaplots.org/latest/generated/gr/ for example code using the GR backend

29 / 87

The monomial basis

Let's approximate sin(x)

using Plots
gr();
f(x) = sin.(x);
Plots.plot(f, 0, 2pi, line = 4, grid = false, legend = false, size = (500, 250))

30 / 87

Approximating sin(x)

function project_monomial(f, n, lb, ub)
# solves Ψc = y → c = Ψ\y
# Ψ = matrix of monomial basis functions evaluted on the grid
coll_points = range(lb, ub, length = n) # collocation points
y_values = f(coll_points) # function values on the grid
basis_functions = [coll_points.^degree for degree = 0:n-1] # vector of basis functions
basis_matrix = hcat(basis_functions...) # basis matrix
coefficients = basis_matrix\y_values # c = Ψ\y
return coefficients
end;
coefficients_4 = project_monomial(f, 4, 0, 2pi);
coefficients_5 = project_monomial(f, 5, 0, 2pi);
coefficients_10 = project_monomial(f, 10, 0, 2pi)
## 10-element Array{Float64,1}:
## 0.0
## 0.9990725797458863
## 0.004015857153649684
## -0.1738437387373486
## 0.007075663351639969
## 0.004040763230876247
## 0.0016747985983553285
## -0.0006194667844101428
## 6.485272688203222e-5
## -2.293696012495368e-6
31 / 87

Approximating sin(x)

Now we need to construct a function f_approx(coefficients, plot_points) that takes in the coefficients vector, and an arbitrary vector of points to evaluate the approximant at (for plotting)

32 / 87

Approximating sin(x)

Now we need to construct a function f_approx(coefficients, plot_points) that takes in the coefficients vector, and an arbitrary vector of points to evaluate the approximant at (for plotting)

function f_approx(coefficients, points)
n = length(coefficients) - 1
basis_functions = [coefficients[degree + 1] * points.^degree for degree = 0:n] # evaluate basis functions
basis_matrix = hcat(basis_functions...) # transform into matrix
function_values = sum(basis_matrix, dims = 2) # sum up into function value
return function_values
end;
32 / 87

Approximating sin(x)

Now we need to construct a function f_approx(coefficients, plot_points) that takes in the coefficients vector, and an arbitrary vector of points to evaluate the approximant at (for plotting)

function f_approx(coefficients, points)
n = length(coefficients) - 1
basis_functions = [coefficients[degree + 1] * points.^degree for degree = 0:n] # evaluate basis functions
basis_matrix = hcat(basis_functions...) # transform into matrix
function_values = sum(basis_matrix, dims = 2) # sum up into function value
return function_values
end;
plot_points = 0:.01:2pi;
f_values_4 = f_approx(coefficients_4, plot_points);
f_values_5 = f_approx(coefficients_5, plot_points);
f_values_10 = f_approx(coefficients_10, plot_points)
## 629×1 Array{Float64,2}:
## 0.0
## 0.009990953610597868
## 0.019981668333012258
## 0.029971103713554246
## 0.03995822109623334
## 0.04994198367415802
## 0.05992135654204973
## 0.06989530674983996
## 0.07986280335732032
## 0.0898228174898161
## ⋮
## -0.08303623493742407
## -0.0730710101597829
## -0.06309900382682798
## -0.05312124622683001
## -0.04313876967565733
## -0.03315260846174084
## -0.023163798791941304
## -0.01317337873884128
## -0.0031823881890886696
32 / 87

Plot

33 / 87

Plot

34 / 87

Plot

35 / 87

Plot

36 / 87

Cool!

We just wrote some code that exploits Stone-Weierstrauss and allows us to (potentially) approximate any continuous function arbitrarily well as n goes to infinity

37 / 87

Cool!

We just wrote some code that exploits Stone-Weierstrauss and allows us to (potentially) approximate any continuous function arbitrarily well as n goes to infinity

To approximate any function we'd need to feed in some basis function g(x, n) as opposed to hard-coding it like I did in the previous slides

37 / 87

Cool!

We just wrote some code that exploits Stone-Weierstrauss and allows us to (potentially) approximate any continuous function arbitrarily well as n goes to infinity

To approximate any function we'd need to feed in some basis function g(x, n) as opposed to hard-coding it like I did in the previous slides

Turns out we never use the monomial basis though

37 / 87

Cool!

We just wrote some code that exploits Stone-Weierstrauss and allows us to (potentially) approximate any continuous function arbitrarily well as n goes to infinity

To approximate any function we'd need to feed in some basis function g(x, n) as opposed to hard-coding it like I did in the previous slides

Turns out we never use the monomial basis though

37 / 87

Cool!

Turns out we never use the monomial basis though

38 / 87

Cool!

Turns out we never use the monomial basis though

Why?

38 / 87

Cool!

Turns out we never use the monomial basis though

Why?

Try approximating Runge's function: f(x) = 1/(1 + 25x^2)

38 / 87

Runge's function

runge(x) = 1 ./ (1 .+ 25x.^2);
coefficients_5 = project_monomial(runge, 5, -1, 1);
coefficients_10 = project_monomial(runge, 10, -1, 1);
plot_points = -1:.01:1;
runge_values_5 = f_approx(coefficients_5, plot_points);
runge_values_10 = f_approx(coefficients_10, plot_points);
39 / 87

Runge's function

40 / 87

Runge's function

41 / 87

Runge's function

42 / 87

Maybe we can just crank up n?

43 / 87

Maybe we can just crank up n?

44 / 87

Monomials are not good

What's the deal?

45 / 87

Monomials are not good

What's the deal?

The matrix of monomials, \Phi, is often ill-conditioned, especially as the degree of the monomials increases

45 / 87

Monomials are not good

What's the deal?

The matrix of monomials, \Phi, is often ill-conditioned, especially as the degree of the monomials increases

The first 6 monomials can induce a condition number of 10^{10}, a substantial loss of precision

45 / 87

Monomials are not good

What's the deal?

The matrix of monomials, \Phi, is often ill-conditioned, especially as the degree of the monomials increases

The first 6 monomials can induce a condition number of 10^{10}, a substantial loss of precision

Second, they can vary dramatically in size, which leads to scaling/truncation errors

45 / 87

Monomials are not good

runge(x) = 1 ./ (1 .+ 25x.^2);
coefficients_10 = project_monomial(runge, 10, -1, 1);
points = rand(10);
n = length(coefficients_10) - 1;
basis_functions = [coefficients_10[degree + 1] * points.^degree for degree = 0:n];
basis_matrix = hcat(basis_functions...);
println("Condition number: $(cond(basis_matrix))")
## Condition number: 1.7447981077501208e20
46 / 87

Monomials are not good

runge(x) = 1 ./ (1 .+ 25x.^2);
coefficients_10 = project_monomial(runge, 10, -1, 1);
points = rand(10);
n = length(coefficients_10) - 1;
basis_functions = [coefficients_10[degree + 1] * points.^degree for degree = 0:n];
basis_matrix = hcat(basis_functions...);
println("Condition number: $(cond(basis_matrix))")
## Condition number: 1.7447981077501208e20
46 / 87

Monomials are not good

runge(x) = 1 ./ (1 .+ 25x.^2);
coefficients_10 = project_monomial(runge, 10, -1, 1);
points = rand(10);
n = length(coefficients_10) - 1;
basis_functions = [coefficients_10[degree + 1] * points.^degree for degree = 0:n];
basis_matrix = hcat(basis_functions...);
println("Condition number: $(cond(basis_matrix))")
## Condition number: 1.7447981077501208e20

Ideally we want an orthogonal basis: when we add another element of the basis, it has sufficiently different behavior than the elements before it so it can capture features of the unknown function that the previous elements couldn't

46 / 87

The Chebyshev basis

Most frequently used is the Chebyshev basis

47 / 87

The Chebyshev basis

Most frequently used is the Chebyshev basis

It has nice approximation properties:

  1. They are easy to compute
47 / 87

The Chebyshev basis

Most frequently used is the Chebyshev basis

It has nice approximation properties:

  1. They are easy to compute

  2. They are orthogonal

47 / 87

The Chebyshev basis

Most frequently used is the Chebyshev basis

It has nice approximation properties:

  1. They are easy to compute

  2. They are orthogonal

  3. They are bounded between [-1,1]

47 / 87

The Chebyshev basis

Most frequently used is the Chebyshev basis

It has nice approximation properties:

  1. They are easy to compute

  2. They are orthogonal

  3. They are bounded between [-1,1]

Chebyshev polynomials are often selected because they minimize the oscillations that occur when approximating functions like Runge's function

47 / 87

The Chebyshev basis

Most frequently used is the Chebyshev basis

It has nice approximation properties:

  1. They are easy to compute

  2. They are orthogonal

  3. They are bounded between [-1,1]

Chebyshev polynomials are often selected because they minimize the oscillations that occur when approximating functions like Runge's function

The Chebyshev polynomial closely approximates the minimax polynomial: the polynomial, given degree d, that minimizes any approximation error to the true function

47 / 87

The Chebyshev basis

Chebyshev polynomials are defined by a recurrence relation, \begin{gather} T_0(x) = 1 \\ T_1(x) = x \\ T_{n+1} = 2xT_n(x) - T_{n-1}(x) \end{gather} and are defined on the domain [-1,1]

48 / 87

The Chebyshev basis

Chebyshev polynomials are defined by a recurrence relation, \begin{gather} T_0(x) = 1 \\ T_1(x) = x \\ T_{n+1} = 2xT_n(x) - T_{n-1}(x) \end{gather} and are defined on the domain [-1,1]

In practice this is easy to expand to any interval [a,b]

48 / 87

The Chebyshev basis

Chebyshev polynomials are defined by a recurrence relation, \begin{gather} T_0(x) = 1 \\ T_1(x) = x \\ T_{n+1} = 2xT_n(x) - T_{n-1}(x) \end{gather} and are defined on the domain [-1,1]

In practice this is easy to expand to any interval [a,b]

Chebyshev polynomials look similar to monomials but have better properties that are visually distinctive

48 / 87

The Chebyshev basis

Chebyshev polynomials are defined by a recurrence relation, \begin{gather} T_0(x) = 1 \\ T_1(x) = x \\ T_{n+1} = 2xT_n(x) - T_{n-1}(x) \end{gather}

Write two functions: cheb_polys(n, x) and monomials(n, x) with a degree of approximation n and vector of points x, that return the values of the approximant at x

49 / 87

The Chebyshev basis

Chebyshev polynomials are defined by a recurrence relation, \begin{gather} T_0(x) = 1 \\ T_1(x) = x \\ T_{n+1} = 2xT_n(x) - T_{n-1}(x) \end{gather}

Write two functions: cheb_polys(n, x) and monomials(n, x) with a degree of approximation n and vector of points x, that return the values of the approximant at x

Write a plotting function plot_function(basis_function, x, n) that takes an arbitrary basis function basis_function and plots all basis functions up to degree n

49 / 87

The Chebyshev basis

Chebyshev polynomials are defined by a recurrence relation, \begin{gather} T_0(x) = 1 \\ T_1(x) = x \\ T_{n+1} = 2xT_n(x) - T_{n-1}(x) \end{gather}

Write two functions: cheb_polys(n, x) and monomials(n, x) with a degree of approximation n and vector of points x, that return the values of the approximant at x

Write a plotting function plot_function(basis_function, x, n) that takes an arbitrary basis function basis_function and plots all basis functions up to degree n

If you can't get the recurrence relation to work, you can use an alternative: T_n(x) = cos(n \, arccos(x))

49 / 87

The two basis functions

# Chebyshev polynomial function
function cheb_polys(x, n)
if n == 0
return x./x # T_0(x) = 1
elseif n == 1
return x # T_1(x) = x
else
cheb_recursion(x, n) =
2x.*cheb_polys.(x, n - 1) .- cheb_polys.(x, n - 2)
return cheb_recursion(x, n) # T_n(x) = 2xT_{n-1}(x) - T_{n-2}(x)
end
end;
# Monomial function
monomials(x, n) = x.^n;
50 / 87

The plotting function

function plot_function(basis_function, x, n)
for i = 1:n-1
f_data = basis_function(x, i)
if i == 1
plot(x, f_data, linewidth = 4.0, xlabel = "x", ylabel = "Basis functions", label = "",
tickfontsize = 14, guidefontsize = 14, grid = false);
else
plot!(x, f_data, linewidth = 4.0, label = "");
end
end
f_data = basis_function(x, n)
plot!(x, f_data, linewidth = 4.0, label = "")
end;
x = -1:.01:1;
51 / 87

Monomials up to degree 5

plot_function(monomials, x, 5)

52 / 87

Chebyshev polynomials up to degree 5

plot_function(cheb_polys, x, 5)

53 / 87

Monomials up to degree 10

plot_function(monomials, x, 10)

54 / 87

Chebyshev polynomials up to degree 10

plot_function(cheb_polys, x, 10)

55 / 87

Chebyshev polynomials rule

Chebyshev polynomials span the space

56 / 87

Chebyshev polynomials rule

Chebyshev polynomials span the space

Monomials clump together

56 / 87

Chebyshev polynomials rule

Chebyshev polynomials span the space

Monomials clump together

Chebyshev polynomials are nice for approximation because they are orthogonal and they form a basis with respect to the weight function \phi(x) = 1/\sqrt{1-x^2}

56 / 87

Chebyshev polynomials rule

Chebyshev polynomials span the space

Monomials clump together

Chebyshev polynomials are nice for approximation because they are orthogonal and they form a basis with respect to the weight function \phi(x) = 1/\sqrt{1-x^2}

They span the polynomial vector space

56 / 87

Chebyshev polynomials rule

Chebyshev polynomials span the space

Monomials clump together

Chebyshev polynomials are nice for approximation because they are orthogonal and they form a basis with respect to the weight function \phi(x) = 1/\sqrt{1-x^2}

They span the polynomial vector space

This means that you can form any polynomial of degree equal to less than the Chebyshev polynomial you are using

56 / 87

Chebyshev polynomials rule

Chebyshev polynomials span the space

Monomials clump together

Chebyshev polynomials are nice for approximation because they are orthogonal and they form a basis with respect to the weight function \phi(x) = 1/\sqrt{1-x^2}

They span the polynomial vector space

This means that you can form any polynomial of degree equal to less than the Chebyshev polynomial you are using

It also guarantees that \Phi has full rank and is invertible

56 / 87

Chebyshev zeros and alternative rep

Also note that the Chebyshev polynomial of order n has n zeros given by x_k = cos\left(\frac{2k-1}{2n}\pi \right), \,\,\, k=1,...,n which tend to cluster quadratically towards the edges of the domain

57 / 87

Chebyshev zeros and alternative rep

Also note that the Chebyshev polynomial of order n has n zeros given by x_k = cos\left(\frac{2k-1}{2n}\pi \right), \,\,\, k=1,...,n which tend to cluster quadratically towards the edges of the domain

You can think about this as projecting sequentially finer uniform grids from a hemicircle onto the x-axis

57 / 87

Two important theorems

There are two important theorems to know about Chebyshev polynomials

58 / 87

Two important theorems

There are two important theorems to know about Chebyshev polynomials

Chebyshev interpolation theorem: If f(x) \in \mathbb{C}[a,b], if \{\psi_i(x), i=0,...\} is a system of polynomials (where \psi_i(x) is of exact degree i) orthogonal with respect to \phi(x) on [a,b] and if p_j = \sum_{i=0}^j \theta_i \psi_i(x) interpolates f(x) in the zeros of \psi_{n+1}(x), then: \lim_{j\rightarrow\infty} \left(|| f-p_j||_2 \right)^2 = \lim_{j\rightarrow\infty}\int_a^b \phi(x) \left(f(x) - p_j \right)^2 dx = 0

58 / 87

Two important theorems

There are two important theorems to know about Chebyshev polynomials

Chebyshev interpolation theorem: If f(x) \in \mathbb{C}[a,b], if \{\psi_i(x), i=0,...\} is a system of polynomials (where \psi_i(x) is of exact degree i) orthogonal with respect to \phi(x) on [a,b] and if p_j = \sum_{i=0}^j \theta_i \psi_i(x) interpolates f(x) in the zeros of \psi_{n+1}(x), then: \lim_{j\rightarrow\infty} \left(|| f-p_j||_2 \right)^2 = \lim_{j\rightarrow\infty}\int_a^b \phi(x) \left(f(x) - p_j \right)^2 dx = 0

What does this say?

58 / 87

Two important theorems

If we have an approximation set of basis functions that are exact at the roots of the n^{th} order polynomials, then as n goes to infinity the approximation error becomes arbitrarily small and converges at a quadratic rate

59 / 87

Two important theorems

If we have an approximation set of basis functions that are exact at the roots of the n^{th} order polynomials, then as n goes to infinity the approximation error becomes arbitrarily small and converges at a quadratic rate

This holds for any type of polynomial, but if they are Chebyshev then convergence is uniform

59 / 87

Two important theorems

If we have an approximation set of basis functions that are exact at the roots of the n^{th} order polynomials, then as n goes to infinity the approximation error becomes arbitrarily small and converges at a quadratic rate

This holds for any type of polynomial, but if they are Chebyshev then convergence is uniform

Unfortunately we cant store an infinite number of polynomials in our computer, we would like to know how big our error is after truncating our sequence of polynomials

59 / 87

Two important theorems

Chebyshev truncation theorem: The error in approximating f is bounded by the sum of all the neglected coefficients

60 / 87

Two important theorems

Chebyshev truncation theorem: The error in approximating f is bounded by the sum of all the neglected coefficients

Since Chebyshev polynomials satisfy Stone-Weierstrauss, an infinite sequence of them can perfectly approximate any continuous function

60 / 87

Two important theorems

Chebyshev truncation theorem: The error in approximating f is bounded by the sum of all the neglected coefficients

Since Chebyshev polynomials satisfy Stone-Weierstrauss, an infinite sequence of them can perfectly approximate any continuous function

Since Chebyshev polynomials are bounded between [-1,1], the sum of the omitted terms is bounded by the sum of the magnitude of the coefficients

60 / 87

Two important theorems

Chebyshev truncation theorem: The error in approximating f is bounded by the sum of all the neglected coefficients

Since Chebyshev polynomials satisfy Stone-Weierstrauss, an infinite sequence of them can perfectly approximate any continuous function

Since Chebyshev polynomials are bounded between [-1,1], the sum of the omitted terms is bounded by the sum of the magnitude of the coefficients

So the error in the approximation is as well!

60 / 87

Two important theorems

We often also have that Chebyshev approximations geometrically converge which give us the following convenient property: |f(x) - f^j(x|\theta)| \sim O(\theta_j) The truncation error by stopping at polynomial j is of the same order as the magnitude of the coefficient \theta_j on the last polynomial

61 / 87

Two important theorems

We often also have that Chebyshev approximations geometrically converge which give us the following convenient property: |f(x) - f^j(x|\theta)| \sim O(\theta_j) The truncation error by stopping at polynomial j is of the same order as the magnitude of the coefficient \theta_j on the last polynomial

Thus in many situations we can simply check the size of the last polynomial to gauge how accurate our approximation is

61 / 87

Boyd's moral principle

Chebyshev polynomials are the most widely used basis

62 / 87

Boyd's moral principle

Chebyshev polynomials are the most widely used basis

This is not purely theoretical but also from practical experience

62 / 87

Boyd's moral principle

Chebyshev polynomials are the most widely used basis

This is not purely theoretical but also from practical experience

John Boyd summarizes decades of experience with function approximation with his moral principle:

  • When in doubt, use Chebyshev polynomials unless the solution is spatially periodic, in which case an ordinary fourier series is better
  • Unless you are sure another set of basis functions is better, use Chebyshev polynomials
  • Unless you are really, really sure another set of basis functions is better use Chebyshev polynomials
62 / 87

Grid point selection

We construct the basis function approximant by evaluating the function on a predefined grid in the domain of V

63 / 87

Grid point selection

We construct the basis function approximant by evaluating the function on a predefined grid in the domain of V

If we have precisely n nodes, x_i, we then have \sum_{j=1}^n c_j \phi_j(x_i) = V(x_i) \,\, \forall i=1,2,...,n \tag{interpolation conditions}

63 / 87

Grid point selection

We can write this problem more compactly as \Phi c = y \tag{interpolation equation} where

  • y is the column vector of V(x_i)
  • c is the column vector of coefficients c_j
  • \Phi is an n\times n matrix of the n basis functions evaluated at the n points
64 / 87

Grid point selection

We can write this problem more compactly as \Phi c = y \tag{interpolation equation} where

  • y is the column vector of V(x_i)
  • c is the column vector of coefficients c_j
  • \Phi is an n\times n matrix of the n basis functions evaluated at the n points

If we recover a set of values at our interpolation nodes, V^*(x_i), we can then simply invert \Phi and right multiply it by y to recover our coefficients

64 / 87

Grid point selection

We can write this problem more compactly as \Phi c = y \tag{interpolation equation} where

  • y is the column vector of V(x_i)
  • c is the column vector of coefficients c_j
  • \Phi is an n\times n matrix of the n basis functions evaluated at the n points

If we recover a set of values at our interpolation nodes, V^*(x_i), we can then simply invert \Phi and right multiply it by y to recover our coefficients

How do we select our set of nodes x_i?

64 / 87

Chebyshev strikes again

A good selection of points are called Chebyshev nodes

65 / 87

Chebyshev strikes again

A good selection of points are called Chebyshev nodes

These are simply the roots of the Chebyshev polynomials above on the domain [-1,1]

65 / 87

Chebyshev strikes again

A good selection of points are called Chebyshev nodes

These are simply the roots of the Chebyshev polynomials above on the domain [-1,1]

They are given by x_k = cos\left(\frac{2k-1}{2n}\pi\right),\,\, k = 1,...,n for some Chebyshev polynomial of degree n

65 / 87

Chebyshev strikes again

A good selection of points are called Chebyshev nodes

These are simply the roots of the Chebyshev polynomials above on the domain [-1,1]

They are given by x_k = cos\left(\frac{2k-1}{2n}\pi\right),\,\, k = 1,...,n for some Chebyshev polynomial of degree n

Mathematically, these also help reduce error in our approximation

65 / 87

Chebyshev strikes again

A good selection of points are called Chebyshev nodes

These are simply the roots of the Chebyshev polynomials above on the domain [-1,1]

They are given by x_k = cos\left(\frac{2k-1}{2n}\pi\right),\,\, k = 1,...,n for some Chebyshev polynomial of degree n

Mathematically, these also help reduce error in our approximation

We can gain intuition by looking at a graph of where Chebyshev nodes are located, plot them yourself!

65 / 87

Chebyshev node function

cheb_nodes(n) = cos.(pi * (2*(1:n) .- 1)./(2n))
## cheb_nodes (generic function with 1 method)
66 / 87

Chebyshev node locations

67 / 87

Chebyshev node locations

68 / 87

Chebyshev node locations

69 / 87

Chebyshev node locations

The nodes are heavily focused near the end points, why is that?

70 / 87

Chebyshev node locations

The nodes are heavily focused near the end points, why is that?

Imagine areas of our approximant near the center of our domain but not at a node

70 / 87

Chebyshev node locations

The nodes are heavily focused near the end points, why is that?

Imagine areas of our approximant near the center of our domain but not at a node

These areas benefit from having multiple nodes on both the left and right

70 / 87

Chebyshev node locations

The nodes are heavily focused near the end points, why is that?

Imagine areas of our approximant near the center of our domain but not at a node

These areas benefit from having multiple nodes on both the left and right

This provides more information for these off-node areas and allows them to be better approximated because we know whats happening nearby in several different directions

70 / 87

Chebyshev node locations

The nodes are heavily focused near the end points, why is that?

Imagine areas of our approximant near the center of our domain but not at a node

These areas benefit from having multiple nodes on both the left and right

This provides more information for these off-node areas and allows them to be better approximated because we know whats happening nearby in several different directions

If we moved to an area closer to the edge of the domain, there may only one node to the left or right of it providing information on what the value of our approximant should be

70 / 87

Chebyshev node locations

The nodes are heavily focused near the end points, why is that?

Imagine areas of our approximant near the center of our domain but not at a node

These areas benefit from having multiple nodes on both the left and right

This provides more information for these off-node areas and allows them to be better approximated because we know whats happening nearby in several different directions

If we moved to an area closer to the edge of the domain, there may only one node to the left or right of it providing information on what the value of our approximant should be

Therefore, it's best to put more nodes in these areas to shore up this informational deficit and get good approximation quality near the edges of our domain

70 / 87

Discrete states

How do we handle a discrete state S_d when trying to approximate V?

71 / 87

Discrete states

How do we handle a discrete state S_d when trying to approximate V?

Just like you might expect, we effectively have a different function approximant over the continuous states for each value of S_d

71 / 87

Multi-dimensional approximation

Thus far we have displayed the Chebyshev basis in only one dimension

72 / 87

Multi-dimensional approximation

Thus far we have displayed the Chebyshev basis in only one dimension

We approximate functions of some arbitrary dimension N by taking the tensor of vectors of the one-dimensional Chebyshev polynomials

72 / 87

Multi-dimensional approximation

Thus far we have displayed the Chebyshev basis in only one dimension

We approximate functions of some arbitrary dimension N by taking the tensor of vectors of the one-dimensional Chebyshev polynomials

Construct a vector of polynomials [\phi_{1,1}, \, \phi_{1,2}, \, \phi_{1,3}] for dimensions 1

72 / 87

Multi-dimensional approximation

Thus far we have displayed the Chebyshev basis in only one dimension

We approximate functions of some arbitrary dimension N by taking the tensor of vectors of the one-dimensional Chebyshev polynomials

Construct a vector of polynomials [\phi_{1,1}, \, \phi_{1,2}, \, \phi_{1,3}] for dimensions 1

Construct a vector of polynomials [\phi_{2,1}, \, \phi_{2,2}, \, \phi_{2,3}] for dimension 2

72 / 87

Multi-dimensional approximation

Thus far we have displayed the Chebyshev basis in only one dimension

We approximate functions of some arbitrary dimension N by taking the tensor of vectors of the one-dimensional Chebyshev polynomials

Construct a vector of polynomials [\phi_{1,1}, \, \phi_{1,2}, \, \phi_{1,3}] for dimensions 1

Construct a vector of polynomials [\phi_{2,1}, \, \phi_{2,2}, \, \phi_{2,3}] for dimension 2

The tensor is just the product of every possibly polynomial pair which results in: [\phi_{1,1}\phi_{2,1} ,\, \phi_{1,1}\phi_{2,2}, \, \phi_{1,1}\phi_{2,3}, \, \phi_{1,2}\phi_{2,1}, \, \phi_{1,2}\phi_{2,2}, \, \phi_{1,2}\phi_{2,3}, \, \phi_{1,3}\phi_{2,1}, \, \phi_{1,3}\phi_{2,2}, \, \phi_{1,3}\phi_{2,3}]

72 / 87

Multi-dimensional approximation

Thus far we have displayed the Chebyshev basis in only one dimension

We approximate functions of some arbitrary dimension N by taking the tensor of vectors of the one-dimensional Chebyshev polynomials

Construct a vector of polynomials [\phi_{1,1}, \, \phi_{1,2}, \, \phi_{1,3}] for dimensions 1

Construct a vector of polynomials [\phi_{2,1}, \, \phi_{2,2}, \, \phi_{2,3}] for dimension 2

The tensor is just the product of every possibly polynomial pair which results in: [\phi_{1,1}\phi_{2,1} ,\, \phi_{1,1}\phi_{2,2}, \, \phi_{1,1}\phi_{2,3}, \, \phi_{1,2}\phi_{2,1}, \, \phi_{1,2}\phi_{2,2}, \, \phi_{1,2}\phi_{2,3}, \, \phi_{1,3}\phi_{2,1}, \, \phi_{1,3}\phi_{2,2}, \, \phi_{1,3}\phi_{2,3}]

We can then solve for the 9 coefficients on these two dimensional polynomials

72 / 87

Multi-dimensional approximation

The computational complexity here grows exponentially: \text{total # points} = (\text{points per state})^{\text{# states}}

73 / 87

Multi-dimensional approximation

The computational complexity here grows exponentially: \text{total # points} = (\text{points per state})^{\text{# states}}

Exponential complexity is costly, often called the Curse of dimensionality

73 / 87

Multi-dimensional approximation

The computational complexity here grows exponentially: \text{total # points} = (\text{points per state})^{\text{# states}}

Exponential complexity is costly, often called the Curse of dimensionality

We will cover smart ways to deal with this later

73 / 87

Finite element methods

An alternative to pseudo-spectral methods is the class of finite element methods

74 / 87

Finite element methods

An alternative to pseudo-spectral methods is the class of finite element methods

Finite element methods use basis functions that are non-zero over subintervals of the domain of our grid

74 / 87

Finite element methods

An alternative to pseudo-spectral methods is the class of finite element methods

Finite element methods use basis functions that are non-zero over subintervals of the domain of our grid

For example, we can use splines (piecewise polynomials) over segments of our domains where they are spliced together at prespecified breakpoints, which are called knots

74 / 87

Finite element methods

The higher the order the polynomial we use, the higher the order of derivatives that we can preserve continuity at the knots

75 / 87

Finite element methods

The higher the order the polynomial we use, the higher the order of derivatives that we can preserve continuity at the knots

For example, a linear spline yields an approximant that is continuous, but its first derivatives are discontinuous step functions unless the underlying value function happened to be precisely linear

75 / 87

Finite element methods

The higher the order the polynomial we use, the higher the order of derivatives that we can preserve continuity at the knots

For example, a linear spline yields an approximant that is continuous, but its first derivatives are discontinuous step functions unless the underlying value function happened to be precisely linear

If we have a quadratic spline, we can also preserve the first derivative's continuity at the knots, but the second derivative will be a discontinuous step function

75 / 87

Finite element methods

As we increase the order of the spline polynomial, we have increasing numbers of coefficients we need to determine

76 / 87

Finite element methods

As we increase the order of the spline polynomial, we have increasing numbers of coefficients we need to determine

To determine these additional coefficients using the same number of points, we require additional conditions that must be satisfied

76 / 87

Finite element methods

As we increase the order of the spline polynomial, we have increasing numbers of coefficients we need to determine

To determine these additional coefficients using the same number of points, we require additional conditions that must be satisfied

These are what ensure continuity of higher order derivatives at the knots as the degree of the spline grows

76 / 87

Finite element methods

With linear splines, each segment of our value function approximant is defined by a linear function

77 / 87

Finite element methods

With linear splines, each segment of our value function approximant is defined by a linear function

For each of these linear components, we need to solve for 1 coefficient and 1 intercept term

77 / 87

Finite element methods

With linear splines, each segment of our value function approximant is defined by a linear function

For each of these linear components, we need to solve for 1 coefficient and 1 intercept term

Each end of a linear segment must equal the function value at the knots

77 / 87

Finite element methods

With linear splines, each segment of our value function approximant is defined by a linear function

For each of these linear components, we need to solve for 1 coefficient and 1 intercept term

Each end of a linear segment must equal the function value at the knots

We have two conditions and two unknowns for each segment: this is a simple set of linear equations that we can solve

77 / 87

Finite element methods

With linear splines, each segment of our value function approximant is defined by a linear function

For each of these linear components, we need to solve for 1 coefficient and 1 intercept term

Each end of a linear segment must equal the function value at the knots

We have two conditions and two unknowns for each segment: this is a simple set of linear equations that we can solve

In numerical models we typically don't use linear splines because we often care about the quality of approximation of higher order derivatives, cubic splines are more common

77 / 87

Cubic splines

Suppose we wish to approximate using a cubic spline on N+1 knots

78 / 87

Cubic splines

Suppose we wish to approximate using a cubic spline on N+1 knots

We need N cubic polynomials when entails 4N coefficients to determine

78 / 87

Cubic splines

Suppose we wish to approximate using a cubic spline on N+1 knots

We need N cubic polynomials when entails 4N coefficients to determine

We can obtain 3(N-1) equations by ensuring that the approximant is continuous at all interior knots, and its first and second derivatives are continuous at all interior knots [3\times(N+1-1-1)]

78 / 87

Cubic splines

Suppose we wish to approximate using a cubic spline on N+1 knots

We need N cubic polynomials when entails 4N coefficients to determine

We can obtain 3(N-1) equations by ensuring that the approximant is continuous at all interior knots, and its first and second derivatives are continuous at all interior knots [3\times(N+1-1-1)]

This means that the value of the left cubic polynomial equals the value of the right cubic polynomial at each interior knot

78 / 87

Cubic splines

Ensuring the the approximant equals the function's value at all of the nodes adds another N+1 equations

79 / 87

Cubic splines

Ensuring the the approximant equals the function's value at all of the nodes adds another N+1 equations

We therefore have a total of 4N-2 equations for 4N coefficients

79 / 87

Cubic splines

Ensuring the the approximant equals the function's value at all of the nodes adds another N+1 equations

We therefore have a total of 4N-2 equations for 4N coefficients

We need two more conditions to solve the problem

79 / 87

Cubic splines

Ensuring the the approximant equals the function's value at all of the nodes adds another N+1 equations

We therefore have a total of 4N-2 equations for 4N coefficients

We need two more conditions to solve the problem

What is often used is that the approximant's first or second derivative matches that of the function at the end points

79 / 87

Splines can potentially handle lots of curvature

If the derivative is of interest for optimization, or to recover some variable of economic meaning, then we may need to have these derivatives preserved well at the knots

80 / 87

Splines can potentially handle lots of curvature

If the derivative is of interest for optimization, or to recover some variable of economic meaning, then we may need to have these derivatives preserved well at the knots

One large benefit of splines is that they can handle kinks or areas of high curvature

80 / 87

Splines can potentially handle lots of curvature

If the derivative is of interest for optimization, or to recover some variable of economic meaning, then we may need to have these derivatives preserved well at the knots

One large benefit of splines is that they can handle kinks or areas of high curvature

How?

80 / 87

Splines can potentially handle lots of curvature

If the derivative is of interest for optimization, or to recover some variable of economic meaning, then we may need to have these derivatives preserved well at the knots

One large benefit of splines is that they can handle kinks or areas of high curvature

How?

By having the modeler place many knots in a concentrated region

80 / 87

Splines can potentially handle lots of curvature

If the derivative is of interest for optimization, or to recover some variable of economic meaning, then we may need to have these derivatives preserved well at the knots

One large benefit of splines is that they can handle kinks or areas of high curvature

How?

By having the modeler place many knots in a concentrated region

If the knots are stacked on top of one another, this actually allows for a kink to be explicitly represented in the value function approximant, but the economist must know precisely where the kink is

80 / 87

Splines can potentially handle lots of curvature

If the derivative is of interest for optimization, or to recover some variable of economic meaning, then we may need to have these derivatives preserved well at the knots

One large benefit of splines is that they can handle kinks or areas of high curvature

How?

By having the modeler place many knots in a concentrated region

If the knots are stacked on top of one another, this actually allows for a kink to be explicitly represented in the value function approximant, but the economist must know precisely where the kink is

Useful spline packages out there: Dierckx, Interpolations, QuantEcon

80 / 87

Code it up!

Let's code up our own linear spline approximation function linear_spline_approx(f, knots), where f is the function we are approximating and knots are the knots

Have it return a function a function spline_eval that takes in evaluation_points as an argument where evaluation_points are the points we want to evaluate the spline approximant at

Hint: Linear splines are pretty easy, given two points (x_{i+1}, y_{i+1}) and (x_i, y_i), the spline in between these points is given by y(x) = y_i + \frac{y_{i+1} - y_i}{x_{i+1} - x_{i}}(x - x_i)

81 / 87

Spline approximator

function linear_spline_approx(f, knots)
function spline_eval(evaluation_points)
prev_knot = knots[1] # initialize previous knot
if !(typeof(evaluation_points) <: Number) # if using multiple points
y_eval = similar(evaluation_points)
y_index = 1
for knot in knots[2:end]
current_points = evaluation_points[prev_knot .<= evaluation_points .< knot]
y_eval[y_index:y_index + length(current_points) - 1] =
f(prev_knot) .+ (f(knot) - f(prev_knot))/(knot - prev_knot)*(current_points .- prev_knot)
prev_knot = knot
y_index += length(current_points)
end
else # if using just a single point
for knot in knots[2:end]
if prev_knot .<= evaluation_points .< knot
y_eval = f(prev_knot) + (f(knot) - f(prev_knot))/(knot - prev_knot)*(evaluation_point - prev_knot)
end
prev_knot = knot
end
end
return y_eval
end
return spline_eval
end
## linear_spline_approx (generic function with 1 method)
82 / 87

Plot

f(x) = sin(x)
## f (generic function with 1 method)
knots_coarse = 0:pi/2:2pi;
spline_func_coarse = linear_spline_approx(f, knots_coarse);
knots_fine = 0:pi/4:2pi;
spline_func_fine = linear_spline_approx(f, knots_fine);
knots_superfine = 0:pi/12:2pi;
spline_func_superfine = linear_spline_approx(f, knots_superfine);
x_vals =0:.05:2pi;
y_vals_coarse = spline_func_coarse(x_vals);
y_vals_fine = spline_func_fine(x_vals);
y_vals_superfine = spline_func_superfine(x_vals);
83 / 87

Plot

84 / 87

Plot

85 / 87

Plot

86 / 87

Plot

87 / 87

Roadmap

  1. What is projection
  2. How we approximate functions
2 / 87
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow