Lecture 1Intro to computingIvan RudikAEM 71301 / 125

Software and stuff

Necessary things to download to follow along today and in the future:

2 / 125

Software and stuff

Necessary things to download to follow along today and in the future:

2 / 125

Software and stuff

Necessary things to download to follow along today and in the future:

For this lecture you will need the following Julia packages

using Plots, ForwardDiff, Distributions, BenchmarkTools

2 / 125

What this class is aboutLearning how to compute dynamic models through approximation and estimation
Other useful computational techniques and estimation approaches
Learning important details about computing models
3 / 125

What you need to succeed in this courseECON 6090 and ECON 6170
Or potentially some other graduate economics classes (see me if you haven't yet)
Previous coding experience or willingness to spend some time learning as you go
4 / 125

Course materialsEverything we use in the course will be freely available and posted to the course GitHub (details next class on how to use Git)
Books (free from the library or authors' websites):Judd (1998)
Miranda and Fackler (2002)
Nocedal and Wright (2006)
Karp and Traeger (2013)

5 / 125

Things to do before next class

Spend some time reading up on Julia:
- Learning Julia
- QuantEcon Julia lectures

6 / 125

What we will cover in the classBasic computing and things you need to think about
Coding, version control, reproducibility, workflow
Optimization
Numerical and empirical dynamic modeling
Difficult expectations
Machine learning
Cloud computing
7 / 125

What you have to doCome to class
4 computational problem sets + code review
Final research project proposal
Final research project
One presentation of a paper from the literature
8 / 125

Important days / timesOffice hours: Tuesday 1:30-3:00 in Warren 462
Final project proposal: March 13
Final project paper: April 30
9 / 125

GradingProblem sets: 40% (10% each)
Final project proposal: 15%
Final project paper/presentation: 25%
Class participation: 10%
Computational paper presentation: 10%
10 / 125

Problem sets (10% each)

You must use Julia and write .jl scripts, no Jupyter

You can work in groups of up to 3

Problem sets will be where you implement the techniques we learn in class on your own, but we will be doing our fair share of coding in class

You will be randomly assigned to do a reproduction test / code check on someone else's homework

Does the code run?
Could it be coded better?
How could it be made more reproducible?

Rubric will come later

11 / 125

Problem sets (10% each)

Why am I making you do this?

12 / 125

Problem sets (10% each)

Why am I making you do this?

If you want to publish in an AEA journal you need to have good practices,
other journals are following suit

12 / 125

Problem sets (10% each)

Why am I making you do this?

If you want to publish in an AEA journal you need to have good practices,
other journals are following suit

Making your code reproducible and generic will make
you a better co-author to others and your future self

12 / 125

Computational paper presentations (10%)

Everyone will present a paper starting in 2-3 weeks

The paper can apply methods we've learned about (or will learn about), or can be a new method that we have not covered

You must consult with me at least 1 week prior to your scheduled presentation date to ensure the paper is appropriate for a presentation

13 / 125

Final project (25% paper, 15% proposal)

The final project will be the beginning of a computationally-driven research project

Proposals will be due about half way through the class

The only requirement is that the project cannot be computationally trivial
(i.e. no 600 observation two-way FE papers)

Everyone will present their final projects in the last week of class

More details on the syllabus and to come later

14 / 125

Slides

All slides will be available on the course GitHub page

15 / 125

Slides

All slides will be available on the course GitHub page

The slides are made with R Markdown and can run Julia via the JuliaCall package, e.g.

# true coefficient
bbeta = π;
# random x data
x = randn(100,1)*5 .+ 3;
# OLS data generating process
y = bbeta.*x .+ randn(100,1)*10;
# OLS estimation
bbeta_hat = inv(x'x)x'y;
println("β-hat is $(round(bbeta_hat[1],digits=3)) and the true β is $(round(bbeta,digits=3)).")

## β-hat is 3.325 and the true β is 3.142.

15 / 125

Why do we need computational methods?

Everything you've done so far has likely been solvable analytically

Including OLS: $\hat{\beta} = (X'X)^{-1}X'Y$

Not all economic models have closed-form solutions, and others can't have closed-form solutions with losing important economic content

This is generally true for dynamic models

16 / 125

What can we compute?

We can use computation + theory to answer quantitative questions

17 / 125

What can we compute?

We can use computation + theory to answer quantitative questions

Theory can't give us welfare in dollar terms

Theory can't tell us the value of economic primitives

17 / 125

What can we compute?

Theory often relies on strong assumptions like

log utility (lose income vs substitution)
no transactions costs (important friction)
strictly concave objectives (natural phenomena don't follow this)
static decisionmaking

It can be unclear what the cost of these assumptions are

18 / 125

Example 1

Suppose we have a constant elasticity demand function: $q(p) = p^{-0.2}$

In equilibrium, quantity demanded is $q^* = 2$

What price clears the market in equilibrium?

19 / 125

Example 1

Suppose we have a constant elasticity demand function: $q(p) = p^{-0.2}$

In equilibrium, quantity demanded is $q^* = 2$

What price clears the market in equilibrium?

Just invert the demand function:

$2 = p^{-0.2}$

19 / 125

Example 1

Suppose we have a constant elasticity demand function: $q(p) = p^{-0.2}$

In equilibrium, quantity demanded is $q^* = 2$

What price clears the market in equilibrium?

Just invert the demand function:

$2 = p^{-0.2}$

$p^* = 2^{-5} \checkmark$

Your calculator can do the rest

19 / 125

Example 2

Suppose the demand function is now: $q(p) = 0.5p^{-0.2} + 0.5p^{-0.5}$ , a weighted average of two CE demand functions

What price clears the market if $\mathbf{q^*=2}$ ?

20 / 125

Example 2

Suppose the demand function is now: $q(p) = 0.5p^{-0.2} + 0.5p^{-0.5}$ , a weighted average of two CE demand functions

What price clears the market if $\mathbf{q^*=2}$ ?

First, does a solution exist?

20 / 125

Example 2

Suppose the demand function is now: $q(p) = 0.5p^{-0.2} + 0.5p^{-0.5}$ , a weighted average of two CE demand functions

What price clears the market if $\mathbf{q^*=2}$ ?

First, does a solution exist?

Yes, why?

20 / 125

Example 2

Suppose the demand function is now: $q(p) = 0.5p^{-0.2} + 0.5p^{-0.5}$ , a weighted average of two CE demand functions

What price clears the market if $\mathbf{q^*=2}$ ?

First, does a solution exist?

Yes, why?

$q(p)$ is monotonically increasing

20 / 125

Example 2

Suppose the demand function is now: $q(p) = 0.5p^{-0.2} + 0.5p^{-0.5}$ , a weighted average of two CE demand functions

What price clears the market if $\mathbf{q^*=2}$ ?

First, does a solution exist?

Yes, why?

$q(p)$ is monotonically increasing

$q(p)$ is less than 2 at $p=0.1$ and greater than 2 at $p=0.2$

20 / 125

Example 2

Suppose the demand function is now: $q(p) = 0.5p^{-0.2} + 0.5p^{-0.5}$ , a weighted average of two CE demand functions

What price clears the market if $\mathbf{q^*=2}$ ?

First, does a solution exist?

Yes, why?

$q(p)$ is monotonically increasing

$q(p)$ is less than 2 at $p=0.1$ and greater than 2 at $p=0.2$

$\rightarrow$ by intermediate value theorem $q(p) = 2$ somewhere in $(0.1,0.2)$

20 / 125

Example 2

# We know solution is between .1 and .2
x = collect(range(.1, stop=.2, length=10)) # generate evenly spaced grid
q_d = ones(size(x)).*2                     # generate equal length vector of qd=2
# Price function
price(p) = p.^(-0.2)/2 .+ p.^(-0.5)/2
# Get corresponding quantity values at these prices
y = price(x)

Now plot $q_d$ and $q(p)$

21 / 125

Example 2

22 / 125

Example 2

Notice: if we let $t = p^{-0.1}$ then:

$q(t) = 0.5t^2 + 0.5t^5$

23 / 125

Example 2

Notice: if we let $t = p^{-0.1}$ then:

$q(t) = 0.5t^2 + 0.5t^5$

Can we solve for $t$ now?

23 / 125

Example 2

Notice: if we let $t = p^{-0.1}$ then:

$q(t) = 0.5t^2 + 0.5t^5$

Can we solve for $t$ now?

No! Closed-form solutions to fifth order polynomials are not guaranteed to exist!

23 / 125

Example 2

Notice: if we let $t = p^{-0.1}$ then:

$q(t) = 0.5t^2 + 0.5t^5$

Can we solve for $t$ now?

No! Closed-form solutions to fifth order polynomials are not guaranteed to exist!

So how do we solve the problem?

23 / 125

Newton's method

Iteratively do the following:

Guess solution to: $q(p) - q^* = 0 \rightarrow q(p) - 2 = 0$
Approximate the function with local second order polynomial around guess
Solve this easier equation
Solution is the new guess
Stop if previous guess and new guess are sufficiently close

We will learn more about this and why it works in a later class

24 / 125

Newton code

# Define demand functions
demand(p) = p^(-0.2)/2 + p^(-0.5)/2 - 2     # quantity minus price
demand_grad(p) = .1*p^(-1.2) + .25*p^(-1.5) # demand gradient

25 / 125

Newton code

# Define demand functions
demand(p) = p^(-0.2)/2 + p^(-0.5)/2 - 2     # quantity minus price
demand_grad(p) = .1*p^(-1.2) + .25*p^(-1.5) # demand gradient

function find_root_newton(demand, demand_grad)
    p = .3        # initial guess
    deltap = 1e10 # initialize stepsize
    while abs(deltap) > 1e-4
        deltap = demand(p)/demand_grad(p)
        p += deltap
        println("Intermediate guess of p = $(round(p,digits=3)).")
    end
    println("The solution is p = $(round(p,digits=3)).")
    return p
end;

25 / 125

Newton code

# Solve for price
find_root_newton(demand, demand_grad)

## Intermediate guess of p = 0.068.
## Intermediate guess of p = 0.115.
## Intermediate guess of p = 0.147.
## Intermediate guess of p = 0.154.
## Intermediate guess of p = 0.154.
## Intermediate guess of p = 0.154.
## The solution is p = 0.154.

## 0.15419764093200633

26 / 125

Example 3

Consider a two period ag commodity market model

Period 1: Farmer makes acreage decisions for planting
Period 2: Per-acre yield realizes, equilibrium crop price clears the market

27 / 125

Example 3

Consider a two period ag commodity market model

Period 1: Farmer makes acreage decisions for planting
Period 2: Per-acre yield realizes, equilibrium crop price clears the market

The farmer's policy function is: $a(E[p]) = \frac{1}{2} + \frac{1}{2}E[p]$

27 / 125

Example 3

Consider a two period ag commodity market model

Period 1: Farmer makes acreage decisions for planting
Period 2: Per-acre yield realizes, equilibrium crop price clears the market

The farmer's policy function is: $a(E[p]) = \frac{1}{2} + \frac{1}{2}E[p]$

After planting, yield $\hat{y}$ realizes, producing a total quantity $q = a\hat{y}$ of the crop

27 / 125

Example 3

Consider a two period ag commodity market model

Period 1: Farmer makes acreage decisions for planting
Period 2: Per-acre yield realizes, equilibrium crop price clears the market

The farmer's policy function is: $a(E[p]) = \frac{1}{2} + \frac{1}{2}E[p]$

After planting, yield $\hat{y}$ realizes, producing a total quantity $q = a\hat{y}$ of the crop

Demand is given by $p(q) = 3-2q$

Yield is given by $\hat{y} \sim \mathcal{N}(1, 0.1)$

27 / 125

How much acreage gets planted?

$p(\hat{y}) = 3-2a\hat{y}$

28 / 125

How much acreage gets planted?

$p(\hat{y}) = 3-2a\hat{y}$

$a = \frac{1}{2} + \frac{1}{2}(3-2aE[\hat{y}])$

28 / 125

How much acreage gets planted?

$p(\hat{y}) = 3-2a\hat{y}$

$a = \frac{1}{2} + \frac{1}{2}(3-2aE[\hat{y}])$

Rearrange and solve:

$a^* = 1$

28 / 125

How much acreage gets planted?

$p(\hat{y}) = 3-2a\hat{y}$

$a = \frac{1}{2} + \frac{1}{2}(3-2aE[\hat{y}])$

Rearrange and solve:

$a^* = 1$

Now suppose the government implements a price floor
on the crop of $p > 1$ so we have that $p(\hat{y}) = \max(1,3-2a\hat{y})$

How much acreage does the farmer plant?

28 / 125

How much acreage gets planted?

This is analytically intractable

29 / 125

How much acreage gets planted?

This is analytically intractable

The max operator is non-linear so we can't pass the expectation through

$E[\max(1,3-2a\hat{y})] \neq \max(1,E[3-2a\hat{y}])$

29 / 125

How much acreage gets planted?

This is analytically intractable

The max operator is non-linear so we can't pass the expectation through

$E[\max(1,3-2a\hat{y})] \neq \max(1,E[3-2a\hat{y}])$

$\rightarrow$ we need to solve this numerically

29 / 125

Function iteration

We can solve this using another technique called function iteration

# Function iteration method to find a root
function find_root_fi(mn, variance)
    y = randn(1000)*sqrt(variance) .+ mn # draws of the random variable
    a = 1.                               # initial guess
    differ = 100.                        # initialize error
    exp_price = 1.                       # initialize expected price
    while differ > 1e-4
        a_old = a                      # save old acreage
        p = max.(1, 3 .- 2 .*a.*y)     # compute price at all distribution points
        exp_price = mean(p)            # compute expected price
        a = 1/2 + 1/2*exp_price        # get new acreage planted given new price
        differ= abs(a - a_old)         # change in acreage planted
        println("Intermediate acreage guess: $(round(a,digits=3))")
    end
    return a, exp_price
end

30 / 125

Function iteration

acreage, expected_price = find_root_fi(1, 0.1)

## Intermediate acreage guess: 1.125
## Intermediate acreage guess: 1.086
## Intermediate acreage guess: 1.097
## Intermediate acreage guess: 1.094
## Intermediate acreage guess: 1.095
## Intermediate acreage guess: 1.094
## Intermediate acreage guess: 1.094

## (1.0944707255507848, 1.1889414511015697)

println("The optimal number of acres to plant is $(round(acreage, digits = 3)).")

## The optimal number of acres to plant is 1.094.

println("The expected price is $(round(expected_price, digits = 3)).")

## The expected price is 1.189.

31 / 125

Big O notation

How do we quantify speed and accuracy of computational algorithms?

i.e. what is the computational complexity of the problem?

32 / 125

Big O notation

How do we quantify speed and accuracy of computational algorithms?

i.e. what is the computational complexity of the problem?

General mathematical definition: Big O notation describes the limiting behavior of a function when the argument tends towards a particular value or infinity

32 / 125

Big O notation

How do we quantify speed and accuracy of computational algorithms?

i.e. what is the computational complexity of the problem?

General mathematical definition: Big O notation describes the limiting behavior of a function when the argument tends towards a particular value or infinity

Programming context: Describes the limiting behavior of algorithms in terms of run time/memory/accuracy as input size grows

32 / 125

Big O notation

How do we quantify speed and accuracy of computational algorithms?

i.e. what is the computational complexity of the problem?

General mathematical definition: Big O notation describes the limiting behavior of a function when the argument tends towards a particular value or infinity

Programming context: Describes the limiting behavior of algorithms in terms of run time/memory/accuracy as input size grows

You've seen this before in the expression of Taylor series' errors

32 / 125

Big O Notation

Written as: O(F(x))

Here is how to think about it:

33 / 125

Big O Notation

Written as: O(F(x))

Here is how to think about it:

O(x): linear

Time to solve increases linearly in input x
Accuracy changes linearly in input x

33 / 125

Big O Notation

Written as: O(F(x))

Here is how to think about it:

O(x): linear

Time to solve increases linearly in input x
Accuracy changes linearly in input x

Examples?

33 / 125

Big O Notation

Written as: O(F(x))

Here is how to think about it:

O(x): linear

Time to solve increases linearly in input x
Accuracy changes linearly in input x

Examples?

Time to find a particular (e.g. maximum) value in an unsorted array

$\rightarrow$ For each element, check whether it is the value we want

33 / 125

Big O Notation

$\mathbf{O(c^n)}$ : exponential

Time to solve increases exponentially in input x
Accuracy changes exponentially in input x

34 / 125

Big O Notation

$\mathbf{O(c^n)}$ : exponential

Time to solve increases exponentially in input x
Accuracy changes exponentially in input x

Examples?

34 / 125

Big O Notation

$\mathbf{O(c^n)}$ : exponential

Time to solve increases exponentially in input x
Accuracy changes exponentially in input x

Examples?

Time to solve a standard dynamic program, ex traveling salesman

$\rightarrow$ For each city $i=1,...,n$ , solve a Bellman as a function of all other cities

34 / 125

Big O Notation

$\mathbf{O(n!)}$ : factorial

Time to solve increases factorially in input x
Accuracy changes factorially in input x

35 / 125

Big O Notation

$\mathbf{O(n!)}$ : factorial

Time to solve increases factorially in input x
Accuracy changes factorially in input x

Examples?

35 / 125

Big O Notation

$\mathbf{O(n!)}$ : factorial

Time to solve increases factorially in input x
Accuracy changes factorially in input x

Examples?

Solving traveling salesman by brute force

$\rightarrow$ Obtain travel time for all possible combinations of intermediate cities

35 / 125

Big O Notation: Accuracy example

This is how you have probably seen Big O used before:

36 / 125

Big O Notation: Accuracy example

This is how you have probably seen Big O used before:

Taylor series for $sin(x)$ around zero:

$sin(x) \approx x - x^3/3! + x^5/5! + O(x^7)$

What does $O(x^7)$ mean here?

36 / 125

Big O Notation: Accuracy example

This is how you have probably seen Big O used before:

Taylor series for $sin(x)$ around zero:

$sin(x) \approx x - x^3/3! + x^5/5! + O(x^7)$

What does $O(x^7)$ mean here?

As we move away from $0$ to some $x$ , the upper bound of the growth rate in the error of our approximation to $sin(x)$ is $x^7$

We are approximating about zero so $x$ is small and $x^n$ is decreasing in $n$

36 / 125

Big O Notation: Accuracy example

This is how you have probably seen Big O used before:

Taylor series for $sin(x)$ around zero:

$sin(x) \approx x - x^3/3! + x^5/5! + O(x^7)$

What does $O(x^7)$ mean here?

As we move away from $0$ to some $x$ , the upper bound of the growth rate in the error of our approximation to $sin(x)$ is $x^7$

We are approximating about zero so $x$ is small and $x^n$ is decreasing in $n$

For small $x$ , higher order polynomials mean the error will grow slower and we have a better local approximation

36 / 125

Taylor expansions

# fifth and third order Taylor approximations
sin_error_5(x) = sin(x) - (x - x^3/6 + x^5/120)
sin_error_3(x) = sin(x) - (x - x^3/6)

37 / 125

Taylor expansions

# fifth and third order Taylor approximations
sin_error_5(x) = sin(x) - (x - x^3/6 + x^5/120)
sin_error_3(x) = sin(x) - (x - x^3/6)

## Error of fifth-order approximation at x = .001 is: 0.0

## Error of third-order approximation at x = .001 is: 8.239936510889834e-18

## Error of fifth-order approximation at x = .01 is: -1.734723475976807e-18

## Error of third-order approximation at x = .01 is: 8.333316675601665e-13

## Error of fifth-order approximation at x = .1 is: -1.983851971587569e-11

## Error of third-order approximation at x = .1 is: 8.331349481138783e-8

37 / 125

Big O Notation: Speed examples

Here are a few examples for fundamental computational methods

38 / 125

Big O Notation: O(1)

O(1): algorithm executes in constant time

The size of the input does not affect execution speed

Example: accessing a specific location in an array

39 / 125

Big O Notation: O(x)

O(x): algorithm executes in linear time

Execution speed grows linearly in input size

Example: inserting an element into an arbitrary location in a 1 dimensional array

Bigger array $\rightarrow$ need to shift around more elements in memory to accomodate the new element

40 / 125

Big O Notation: $O(x^2)$

$\mathbf{O(x^2):}$ algorithm executes in quadratic time

More generally called polynomial time for $x^n$

Execution speed grows quadratically in input size

Example: bubble sort, step through a list, compare adjacent elements, swap if in the wrong order

41 / 125

Computer arithmetic - storage

Question: which numbers can be represented by a computer?

42 / 125

Computer arithmetic - storage

Question: which numbers can be represented by a computer?

Before the answer, how are numbers physically represented by a computer?

42 / 125

Computer arithmetic - storage

Question: which numbers can be represented by a computer?

Before the answer, how are numbers physically represented by a computer?

Binary: a base 2 number system

Each digit can only take on 0 or 1

42 / 125

Computer arithmetic - storage

Question: which numbers can be represented by a computer?

Before the answer, how are numbers physically represented by a computer?

Binary: a base 2 number system

Each digit can only take on 0 or 1

Base 10: each digit can take on 0-9

42 / 125

Computer arithmetic - storage

Question: which numbers can be represented by a computer?

43 / 125

Computer arithmetic - storage

Question: which numbers can be represented by a computer?

Answer: a subset of the rational numbers

43 / 125

Computer arithmetic - storage

Question: which numbers can be represented by a computer?

Answer: a subset of the rational numbers

Computers have finite memory and hard disk space, there are infinite rational numbers

43 / 125

Computer arithmetic - storage

Question: which numbers can be represented by a computer?

Answer: a subset of the rational numbers

Computers have finite memory and hard disk space, there are infinite rational numbers

This imposes a strict limitation on the storage of numbers

43 / 125

Computer arithmetic - storage

Question: which numbers can be represented by a computer?

Answer: a subset of the rational numbers

Computers have finite memory and hard disk space, there are infinite rational numbers

This imposes a strict limitation on the storage of numbers

Numbers are stored as: $\pm mb^{\pm n}$

$m$ is the mantissa/significand, $b$ is the base, $n$ is the exponent

All three are integers

43 / 125

Computer arithmetic - storage

$\pm mb^{\pm n}$

The significand typically gives the significant digits

The exponent scales the number up or down in magnitude

44 / 125

Computer arithmetic - storage

The size of numbers a computer can represent is limited
by how much space is typically allocated for a real number

45 / 125

Computer arithmetic - storage

The size of numbers a computer can represent is limited
by how much space is typically allocated for a real number

Space allocations are usually 64 bits: 53 for $m$ and 11 for $n$

println(typeof(5.0))

## Float64

println(typeof(5))

## Int64

45 / 125

Computer arithmetic - storage

Int64 means it is a integer with 64 bits of storage

Float64 means it is a floating point number with 64 bits of storage

Floating point just means $b^{\pm n}$ can move the decimal point around in the significand

Int64 and Float64 are different, this will be important later

46 / 125

The limits of computers

Limitations on storage suggest three facts

47 / 125

The limits of computers

Limitations on storage suggest three facts

1. There exists a machine epsilon which denotes the smallest relative quantity representible by a computer

Machine epsilon is the smallest $\epsilon$ such that a machine can always distinguish $N+\epsilon > N > N-\epsilon$

47 / 125

The limits of computers

println("Machine epsilon ϵ is $(eps(Float64))")

## Machine epsilon ϵ is 2.220446049250313e-16

println("Is 1 + ϵ/2 > 1? $(1 + eps(Float64)/2 > 1)")

## Is 1 + ϵ/2 > 1? false

println("Is 1 - ϵ/2 < 1? $(1 - eps(Float64)/2 < 1)")

## Is 1 - ϵ/2 < 1? true

48 / 125

The limits of computers

println("The smallest representable number larger than 1.0 is $(nextfloat(1.0))")

## The smallest representable number larger than 1.0 is 1.0000000000000002

println("The largest representable number smaller than 1.0 is $(prevfloat(1.0))")

## The largest representable number smaller than 1.0 is 0.9999999999999999

49 / 125

The limits of computers

Machine epsilon changes depending on the amount of storage allocated

50 / 125

The limits of computers

Machine epsilon changes depending on the amount of storage allocated

println("32 bit machine epsilon is $(eps(Float32))")

## 32 bit machine epsilon is 1.1920929e-7

println("Is 1 + ϵ/2 > 1? $(Float32(1) + eps(Float32)/2 > 1)")

## Is 1 + ϵ/2 > 1? false

println("Is 1 - ϵ/2 < 1? $(Float32(1) - eps(Float32)/2 < 1)")

## Is 1 - ϵ/2 < 1? true

50 / 125

The limits of computers

Machine epsilon changes depending on the amount of storage allocated

println("32 bit machine epsilon is $(eps(Float32))")

## 32 bit machine epsilon is 1.1920929e-7

println("Is 1 + ϵ/2 > 1? $(Float32(1) + eps(Float32)/2 > 1)")

## Is 1 + ϵ/2 > 1? false

println("Is 1 - ϵ/2 < 1? $(Float32(1) - eps(Float32)/2 < 1)")

## Is 1 - ϵ/2 < 1? true

This means theres a tradeoff between precision and storage requirements,
this matters for low-memory systems like GPUs

50 / 125

The limits of computers

2. There is a smallest representable number

println("64 bit smallest float is $(floatmin(Float64))")

## 64 bit smallest float is 2.2250738585072014e-308

println("32 bit smallest float is $(floatmin(Float32))")

## 32 bit smallest float is 1.1754944e-38

println("16 bit smallest float is $(floatmin(Float16))")

## 16 bit smallest float is 6.104e-5

51 / 125

The limits of computers

3. There is a largest representable number

println("64 bit largest float is $(floatmax(Float64))")

## 64 bit largest float is 1.7976931348623157e308

println("32 bit largest float is $(floatmax(Float32))")

## 32 bit largest float is 3.4028235e38

println("16 bit largest float is $(floatmax(Float16))")

## 16 bit largest float is 6.55e4

52 / 125

The limits of computers

We can overcome these issues (if they ever arise) with arbitrary precision arithmetic

53 / 125

The limits of computers

We can overcome these issues (if they ever arise) with arbitrary precision arithmetic

println("The largest 64 bit integer is $(typemax(Int64))")

## The largest 64 bit integer is 9223372036854775807

println("Add one to it and we get: $(typemax(Int64)+1)")

## Add one to it and we get: -9223372036854775808

println("It loops us around the number line: $(typemin(Int64))")

## It loops us around the number line: -9223372036854775808

println("With arbitrary precision we can make it much bigger: $(BigInt(typemax(Int64))*1000000)")

## With arbitrary precision we can make it much bigger: 9223372036854775807000000

53 / 125

The limits of computers

The scale of your problem matters

If a parameter or variable is > floatmax or < floatmin, you will have a very bad time

Scale numbers appropriately (e.g. millions of dollars, not millionths of cents)

54 / 125

Computer arithmetic

Error

We can only represent a finite number of numbers

This means we will have error in our computations

Error comes in two major forms:

Rounding
Truncation

55 / 125

Rounding

We will always need to round numbers to the nearest computer representable number, this introduces error

println("Half of π is: $(π/2)")

## Half of π is: 1.5707963267948966

56 / 125

Rounding

We will always need to round numbers to the nearest computer representable number, this introduces error

println("Half of π is: $(π/2)")

## Half of π is: 1.5707963267948966

The computer gave us a rational number, but $\pi/2$ should be irrational

56 / 125

Truncation

Lots of important numbers are defined by infinite sums $e^x = \sum_{n=0}^\infty \frac{x^n}{n!}$

57 / 125

Truncation

Lots of important numbers are defined by infinite sums $e^x = \sum_{n=0}^\infty \frac{x^n}{n!}$

It turns out that computers can't add up infinitely many terms because there is finite space

$\rightarrow$ we need to truncate the sum

57 / 125

Why does this matter?

Errors are small, who cares?

58 / 125

Why does this matter?

Errors are small, who cares?

You should!

Because errors can propagate and grow as you keep applying an algorithm (e.g. function iteration)

58 / 125

Error example 1

Consider a simple quadratic: $x^2-26x+1=0$ , the solution is: $x = 13-\sqrt{168}$

59 / 125

Error example 1

Consider a simple quadratic: $x^2-26x+1=0$ , the solution is: $x = 13-\sqrt{168}$

println("Arbitrary precision: 13 - √168 = $(BigFloat(13-sqrt(168)))")

## Arbitrary precision: 13 - √168 = 0.03851860318427924312345567159354686737060546875

println("64 bit: 13 - √168 = $(13-sqrt(168))")

## 64 bit: 13 - √168 = 0.03851860318427924

println("32 bit: 13 - √168 = $(convert(Float32,13-sqrt(168)))")

## 32 bit: 13 - √168 = 0.038518604

println("16 bit: 13 - √168 = $(convert(Float16,13-sqrt(168)))")

## 16 bit: 13 - √168 = 0.0385

59 / 125

Error example 2

Lets add and subtract some numbers and play around with the associative property of real numbers:

$x = (10^{-20} + 1) - 1$ $y = 10^{-20} + (1 - 1)$

60 / 125

Error example 2

Lets add and subtract some numbers and play around with the associative property of real numbers:

$x = (10^{-20} + 1) - 1$ $y = 10^{-20} + (1 - 1)$

Very clearly we should get $x=y$ , but do we? Let's find out

60 / 125

Error example 2

x = (1e-20 + 1) - 1   # initialize x
y = 1e-20 + (1 - 1)   # initialize y
x_equals_y = (x == y) # store boolean of whether x == y

if x_equals_y
    println("X equals Y!")
else
    println("X does not equal Y!")
    println("The difference is: $(x-y).")
end

## X does not equal Y!
## The difference is: -1.0e-20.

61 / 125

Error example 2

The two numbers were not equal, we got $y > x$

Why?

62 / 125

Error example 2

The two numbers were not equal, we got $y > x$

Why?

Adding numbers of greatly different magnitudes
does not always work like you would want

62 / 125

Error example 2

x = (1e-20 + 1) - 1   # initialize x

## 0.0

y = 1e-20 + (1 - 1)   # initialize y

## 1.0e-20

println("x is $x")

## x is 0.0

println("y is $y")

## y is 1.0e-20

63 / 125

Error example 2

x = (1e-20 + 1) - 1   # initialize x

## 0.0

y = 1e-20 + (1 - 1)   # initialize y

## 1.0e-20

println("x is $x")

## x is 0.0

println("y is $y")

## y is 1.0e-20

When we added $10^{-20}$ to $1$ , it got rounded away

63 / 125

Error example 3

Lets just subtract two numbers: 100000.2 - 100000.1

We know the answer is: 0.1

64 / 125

Error example 3

Lets just subtract two numbers: 100000.2 - 100000.1

We know the answer is: 0.1

println("100000.2 - 100000.1 is: $(100000.2 - 100000.1)")

## 100000.2 - 100000.1 is: 0.09999999999126885

if (100000.2 - 100000.1) == 0.1
    println("and it is equal to 0.1")
else
    println("and it is not equal to 0.1")
end

## and it is not equal to 0.1

64 / 125

Error example 3

Why do we get this error?

65 / 125

Error example 3

Why do we get this error?

Neither of the two numbers can be precisely represented by the machine!

$100000.1 \approx 8589935450993459\times 2^{-33} = 100000.0999999999767169356346130$ $100000.2 \approx 8589936309986918\times 2^{-33} = 100000.1999999999534338712692261$

So their difference won't necessarily be 0.1

There are tools for approximate equality

isapprox(100000.2 - 100000.1, 0.1)

## true

65 / 125

Rounding and truncation recapThis matters, particularly when you're trying to evaluate logical expressions of equality66 / 125

Calculus operationsDifferentiation67 / 125

Differentiation

Derivatives are obviously important in economics for finding optimal allocations, etc

The formal definition of a derivative is:

68 / 125

Differentiation

Derivatives are obviously important in economics for finding optimal allocations, etc

The formal definition of a derivative is:

$\frac{d f(x)}{dx} = \lim_{h\rightarrow 0} \frac{f(x+h)-f(x)}{h}$

68 / 125

Differentiation

Derivatives are obviously important in economics for finding optimal allocations, etc

The formal definition of a derivative is:

$\frac{d f(x)}{dx} = \lim_{h\rightarrow 0} \frac{f(x+h)-f(x)}{h}$

But we can let $t = 1/h$ and reframe this as an infinite limit

68 / 125

Differentiation

Derivatives are obviously important in economics for finding optimal allocations, etc

The formal definition of a derivative is:

$\frac{d f(x)}{dx} = \lim_{h\rightarrow 0} \frac{f(x+h)-f(x)}{h}$

But we can let $t = 1/h$ and reframe this as an infinite limit

$\frac{d f(x)}{dx} = \lim_{t\rightarrow \infty} \frac{f(x+1/t)-f(x)}{1/t}$

which we know a computer can't handle because of finite space to store $t$

68 / 125

Computer differentiation

How do we perform derivatives on computers if we can't take the limit?

69 / 125

Computer differentiation

How do we perform derivatives on computers if we can't take the limit?

Finite difference methods

69 / 125

Computer differentiation

How do we perform derivatives on computers if we can't take the limit?

Finite difference methods

69 / 125

Computer differentiation

How do we perform derivatives on computers if we can't take the limit?

Finite difference methods

What does a finite difference approximation look like?

69 / 125

Forward difference

The forward difference looks exactly like the formal definition without the limit: $\frac{d f(x)}{dx} \approx \frac{f(x+h)-f(x)}{h}$

70 / 125

Forward difference

The forward difference looks exactly like the formal definition without the limit: $\frac{d f(x)}{dx} \approx \frac{f(x+h)-f(x)}{h}$

Works the same for partial derivatives: $\frac{\partial g(x,y)}{\partial x} \approx \frac{g(x+h,y)-g(x,y)}{h}$

Let's see how it works in practice by calculating derivatives of $x^2$ at $x=2$

70 / 125

Forward difference

deriv_x_squared(h,x) = ((x+h)^2 - x^2)/h # derivative function

71 / 125

Forward difference

deriv_x_squared(h,x) = ((x+h)^2 - x^2)/h # derivative function

println("The deriviative with h=1e-8 is: $(deriv_x_squared(1e-8,2.))")

## The deriviative with h=1e-8 is: 3.999999975690116

println("The deriviative with h=1e-12 is: $(deriv_x_squared(1e-12,2.))")

## The deriviative with h=1e-12 is: 4.000355602329364

println("The deriviative with h=1e-30 is: $(deriv_x_squared(1e-30,2.))")

## The deriviative with h=1e-30 is: 0.0

println("The deriviative with h=1e-1 is: $(deriv_x_squared(1e-1,2.))")

## The deriviative with h=1e-1 is: 4.100000000000001

71 / 125

Error, it's there

None of the values we chose for $h$ were perfect,
but clearly some were better than others

72 / 125

Error, it's there

None of the values we chose for $h$ were perfect,
but clearly some were better than others

Why?

72 / 125

Error, it's there

None of the values we chose for $h$ were perfect,
but clearly some were better than others

Why?

We face two opposing forces:

72 / 125

Error, it's there

None of the values we chose for $h$ were perfect,
but clearly some were better than others

Why?

We face two opposing forces:

We want $h$ to be as small as possible so that
we can approximate the limit as well as we possibly can, BUT

72 / 125

Error, it's there

None of the values we chose for $h$ were perfect,
but clearly some were better than others

Why?

We face two opposing forces:

We want $h$ to be as small as possible so that
we can approximate the limit as well as we possibly can, BUT
If $h$ is small then $f(x+h)$ is close to $f(x)$ ,
we can run into rounding issues like we saw for $h=10^{-30}$

72 / 125

Error, it's there

None of the values we chose for $h$ were perfect,
but clearly some were better than others

Why?

We face two opposing forces:

We want $h$ to be as small as possible so that
we can approximate the limit as well as we possibly can, BUT
If $h$ is small then $f(x+h)$ is close to $f(x)$ ,
we can run into rounding issues like we saw for $h=10^{-30}$

We can select $h$ in an optimal fashion: $h = \max\{|x|,1\}\sqrt{\epsilon}$

72 / 125

How much error is in a finite difference?

Can we measure the error growth rate in $h$ (i.e. Big O notation)?

73 / 125

How much error is in a finite difference?

Can we measure the error growth rate in $h$ (i.e. Big O notation)?

Perform a first-order taylor expansion of $f(x)$ around $x$ :

73 / 125

How much error is in a finite difference?

Can we measure the error growth rate in $h$ (i.e. Big O notation)?

Perform a first-order taylor expansion of $f(x)$ around $x$ :

$f(x+h) = f(x) + f'(x)h + O(h^2)$

Recall $O(h^2)$ means the error in our approximation grows quadratically in $h$ ,
we only did a linear approximation

73 / 125

How much error is in a finite difference?

Can we measure the error growth rate in $h$ (i.e. Big O notation)?

Perform a first-order taylor expansion of $f(x)$ around $x$ :

$f(x+h) = f(x) + f'(x)h + O(h^2)$

Recall $O(h^2)$ means the error in our approximation grows quadratically in $h$ ,
we only did a linear approximation

How can we use this to understand the error in our finite difference approximation?

73 / 125

How much error is in a finite difference?

Rearrange to obtain: $f'(x) = \frac{f(x+h) - f(x)}{h} + O(h^2)/h$

74 / 125

How much error is in a finite difference?

Rearrange to obtain: $f'(x) = \frac{f(x+h) - f(x)}{h} + O(h^2)/h$

Forward differences have linearly growing errors

$f'(x) = \frac{f(x+h) - f(x)}{h} + O(h)$ because $O(h^2)/h = O(h)$

If we halve $h$ , we halve the error in our approximation
(ignoring rounding/truncation issues)

74 / 125

Improvements on the forward difference

How can we improve the accuracy of the forward difference?

75 / 125

Improvements on the forward difference

How can we improve the accuracy of the forward difference?

First, why do we have error?

75 / 125

Improvements on the forward difference

How can we improve the accuracy of the forward difference?

First, why do we have error?

Because we are approximating the slope of a tangent curve at $x$
by a secant curve passing through $(x,x+h)$

The secant curve has the average slope of $f(x)$ on $[x,x+h]$

75 / 125

Improvements on the forward difference

How can we improve the accuracy of the forward difference?

First, why do we have error?

Because we are approximating the slope of a tangent curve at $x$
by a secant curve passing through $(x,x+h)$

The secant curve has the average slope of $f(x)$ on $[x,x+h]$

We want the derivative at $x$ , which is on the edge of $[x,x+h]$ , how about we $x$ ?

75 / 125

Central differences

We can approximate $f'(x)$ in a slightly different way: $f'(x) \approx \frac{f(x+h)-f(x-h)}{2h}$

76 / 125

Central differences

We can approximate $f'(x)$ in a slightly different way: $f'(x) \approx \frac{f(x+h)-f(x-h)}{2h}$

This leaves $x$ in the middle of the interval
over which we are averaging the slope of $f(x)$

76 / 125

Central differences

We can approximate $f'(x)$ in a slightly different way: $f'(x) \approx \frac{f(x+h)-f(x-h)}{2h}$

This leaves $x$ in the middle of the interval
over which we are averaging the slope of $f(x)$

Is this an improvement on forward differences?

76 / 125

How much error is in a central finite difference?

Lets do two second-order Taylor expansions:

$f(x+h) = f(x) + f'(x)h + f''(x)h^2/2! + O(h^3)$
$f(x-h) = f(x) + f'(x)(-h) + f''(x) (-h)^2/2! + O(h^3)$

77 / 125

How much error is in a central finite difference?

Lets do two second-order Taylor expansions:

$f(x+h) = f(x) + f'(x)h + f''(x)h^2/2! + O(h^3)$
$f(x-h) = f(x) + f'(x)(-h) + f''(x) (-h)^2/2! + O(h^3)$

Subtract the two expressions (note that $O(h^3) - O(h^3) = O(h^3)$ )
and then divide by $2h$ to get

77 / 125

How much error is in a central finite difference?

Lets do two second-order Taylor expansions:

$f(x+h) = f(x) + f'(x)h + f''(x)h^2/2! + O(h^3)$
$f(x-h) = f(x) + f'(x)(-h) + f''(x) (-h)^2/2! + O(h^3)$

Subtract the two expressions (note that $O(h^3) - O(h^3) = O(h^3)$ )
and then divide by $2h$ to get

$f'(x) = \frac{f(x+h)-f(x-h)}{2h} + O(h^2)$

77 / 125

How much error is in a central finite difference?

$f'(x) = \frac{f(x+h)-f(x-h)}{2h} + O(h^2)$

78 / 125

How much error is in a central finite difference?

$f'(x) = \frac{f(x+h)-f(x-h)}{2h} + O(h^2)$ Error falls quadratically in $h$ , if we halve $h$ we reduce error by 75%

78 / 125

How much error is in a central finite difference?

$f'(x) = \frac{f(x+h)-f(x-h)}{2h} + O(h^2)$ Error falls quadratically in $h$ , if we halve $h$ we reduce error by 75%

Optimal selection of $h$ for central differences is $h = \max\{|x|,1\}\epsilon^{1/3}$

78 / 125

Why use anything but central differences?

Suppose we're computing a Jacobian of a multidimensional function,
why would we ever use forward differences instead of central differences?

79 / 125

Why use anything but central differences?

Suppose we're computing a Jacobian of a multidimensional function,
why would we ever use forward differences instead of central differences?

For each central difference:

79 / 125

Why use anything but central differences?

Suppose we're computing a Jacobian of a multidimensional function,
why would we ever use forward differences instead of central differences?

For each central difference:

We need to compute $g(x_1-h,x_2,...)$ and $g(x_1+h,x_2,...)$ for each $x_i$

79 / 125

Why use anything but central differences?

Suppose we're computing a Jacobian of a multidimensional function,
why would we ever use forward differences instead of central differences?

For each central difference:

We need to compute $g(x_1-h,x_2,...)$ and $g(x_1+h,x_2,...)$ for each $x_i$

But for a forward difference we only need to compute $g(x_1,x_2,...)$ once
and then $g(x_1+h,x_2,...)$ for each $x_i$

79 / 125

Why use anything but central differences?

Suppose we're computing a Jacobian of a multidimensional function,
why would we ever use forward differences instead of central differences?

For each central difference:

We need to compute $g(x_1-h,x_2,...)$ and $g(x_1+h,x_2,...)$ for each $x_i$

But for a forward difference we only need to compute $g(x_1,x_2,...)$ once
and then $g(x_1+h,x_2,...)$ for each $x_i$

Forward differences saves on computing time
and operations at the expense of accuracy

For high dimensional functions it may be worth the tradeoff

79 / 125

Higher order finite differences

We can use these techniques to approximate higher order derivatives

80 / 125

Higher order finite differences

We can use these techniques to approximate higher order derivatives

For example, take two third order Taylor expansions

80 / 125

Higher order finite differences

We can use these techniques to approximate higher order derivatives

For example, take two third order Taylor expansions

$f(x+h) = f(x) + f'(x)h + f''(x)h^2/2! + f'''(x)h^3/3! + O(h^4)$
$f(x-h) = f(x) + f'(x)(-h) + f''(x)(-h)^2/2! + f'''(x)(-h)^3/3! + O(h^4)$

80 / 125

Higher order finite differences

We can use these techniques to approximate higher order derivatives

For example, take two third order Taylor expansions

$f(x+h) = f(x) + f'(x)h + f''(x)h^2/2! + f'''(x)h^3/3! + O(h^4)$
$f(x-h) = f(x) + f'(x)(-h) + f''(x)(-h)^2/2! + f'''(x)(-h)^3/3! + O(h^4)$

Add the two expressions and then divide by $h^2$ to get

80 / 125

Higher order finite differences

We can use these techniques to approximate higher order derivatives

For example, take two third order Taylor expansions

$f(x+h) = f(x) + f'(x)h + f''(x)h^2/2! + f'''(x)h^3/3! + O(h^4)$
$f(x-h) = f(x) + f'(x)(-h) + f''(x)(-h)^2/2! + f'''(x)(-h)^3/3! + O(h^4)$

Add the two expressions and then divide by $h^2$ to get

$f''(x) = \frac{f(x+h) - 2f(x) + f(x-h)}{h^2} + O(h^2)$

80 / 125

Higher order finite differences

We can use these techniques to approximate higher order derivatives

For example, take two third order Taylor expansions

$f(x+h) = f(x) + f'(x)h + f''(x)h^2/2! + f'''(x)h^3/3! + O(h^4)$
$f(x-h) = f(x) + f'(x)(-h) + f''(x)(-h)^2/2! + f'''(x)(-h)^3/3! + O(h^4)$

Add the two expressions and then divide by $h^2$ to get

$f''(x) = \frac{f(x+h) - 2f(x) + f(x-h)}{h^2} + O(h^2)$

Second derivatives are important for calculating Hessians
and checking maxima or minima

80 / 125

Differentiation without error?

Finite differences put us in between two opposing forces on the size of $h$

81 / 125

Differentiation without error?

Finite differences put us in between two opposing forces on the size of $h$

Can we improve upon finite differences?

81 / 125

Differentiation without error?

Finite differences put us in between two opposing forces on the size of $h$

Can we improve upon finite differences?

Analytic derivatives

One way is to code up the actual derivative

81 / 125

Differentiation without error?

Finite differences put us in between two opposing forces on the size of $h$

Can we improve upon finite differences?

Analytic derivatives

One way is to code up the actual derivative

deriv_x_squared(x) = 2x

## The deriviative is: 4.0

81 / 125

Differentiation without error?

Finite differences put us in between two opposing forces on the size of $h$

Can we improve upon finite differences?

Analytic derivatives

One way is to code up the actual derivative

deriv_x_squared(x) = 2x

## The deriviative is: 4.0

Exact solution!

81 / 125

Automatic differentiation

Coding up analytic derivatives by hand for
complex problems is not always great because

82 / 125

Automatic differentiation

Coding up analytic derivatives by hand for
complex problems is not always great because

It can take A LOT of programmer time, more than it is worth

82 / 125

Automatic differentiation

Coding up analytic derivatives by hand for
complex problems is not always great because

It can take A LOT of programmer time, more than it is worth

Humans are suseptible to error in coding or calculating the derivative mathematically

82 / 125

Autodiff: let the computer do it

Think about this: your code is always made up of simple arithmetic operations

add, subtract, divide, multiply
trig functions
exponentials/logs
etc

83 / 125

Autodiff: let the computer do it

Think about this: your code is always made up of simple arithmetic operations

add, subtract, divide, multiply
trig functions
exponentials/logs
etc

The closed form derivatives of these operations is not hard, it turns out your computer can do it and yield exact solutions

83 / 125

Autodiff: let the computer do it

How?

84 / 125

Autodiff: let the computer do it

How?

There are methods that basically apply a giant chain rule to your whole program, and break down the derivative into the (easy) component parts that another package knows how to handle

84 / 125

Autodiff: let the computer do it

How?

There are methods that basically apply a giant chain rule to your whole program, and break down the derivative into the (easy) component parts that another package knows how to handle

# the function, it needs to be written in a particular way because the autodiff package is dumb
# *(x,y) is the julia function way to do x*y, 
# autodiff needs to be in terms of julia functions to work correctly ¯\_(ツ)_/¯
ff(x) = *(x[1],x[1]) # x^2
x = [2.]; # location to evaluate: ff(x) = 2^2

84 / 125

Autodiff: let the computer do it

using ForwardDiff
g(f,x) = ForwardDiff.gradient(f,x); # g = ∇f

## g (generic function with 1 method)

println("ff'(x) at $(x[1]) is: $(g(ff,x)[1])") # display gradient value

## ff'(x) at 2.0 is: 4.0

85 / 125

Autodiff: let the computer do it

using ForwardDiff
g(f,x) = ForwardDiff.gradient(f,x); # g = ∇f

## g (generic function with 1 method)

println("ff'(x) at $(x[1]) is: $(g(ff,x)[1])") # display gradient value

## ff'(x) at 2.0 is: 4.0

Exact solution!

85 / 125

Autodiff: let the computer do it

Once you get the hang of coding up function for autodiff it's not that hard

fff(x) = sin(*(x[1],x[1])) # x^2

86 / 125

Autodiff: let the computer do it

Once you get the hang of coding up function for autodiff it's not that hard

fff(x) = sin(*(x[1],x[1])) # x^2

x = [2.]                                    # location to evaluate: ff(x) = 2^2

## 1-element Array{Float64,1}:
##  2.0

g(f,x) = ForwardDiff.gradient(f,x)        # g = ∇f

## g (generic function with 1 method)

println("fff'(x) at $(x[1]) is: $(g(fff,x)[1])") # display gradient value

## fff'(x) at 2.0 is: -2.6145744834544478

86 / 125

Calculus operationsIntegration  (trickier than differentiation)87 / 125

Calculus operations

Integration (trickier than differentiation)

We integrate to do a lot of stuff in economics

87 / 125

Calculus operations

Integration (trickier than differentiation)

We integrate to do a lot of stuff in economics

Expectations
Add up a continuous measure of things

87 / 125

Calculus operations

Integration (trickier than differentiation)

We integrate to do a lot of stuff in economics

Expectations
Add up a continuous measure of things

$\int_D f(x) dx$ , $f:\mathcal{R}^n-\mathcal{R}$ , $D\subset\mathcal{R}^n$

87 / 125

How to think about integrals

Integrals are effectively infinite sums

88 / 125

How to think about integrals

Integrals are effectively infinite sums

1 dimensional example:

$\lim_{dx_i\rightarrow0}\sum_{i=0}^{(a-b)/dx_i} f(x_i) dx_i$

where $dx_i$ is some subset of $[a,b]$ and $x_i$ is some evaluation point (e.g. midpoint of $dx_i$ )

88 / 125

Infinite limits strike again

Just like derivatives, we face an infinite limit as $(a-b)/dx_i \rightarrow \infty$

We avoid this issue in the same way as derivatives, we replace the infinite sum with something we can handle

89 / 125

Monte Carlo integration

Probably the most commonly used form in empirical econ

90 / 125

Monte Carlo integration

Probably the most commonly used form in empirical econ

Approximate an integral by relying on LLN
and "randomly" sampling the integration domain

90 / 125

Monte Carlo integration

Probably the most commonly used form in empirical econ

Approximate an integral by relying on LLN
and "randomly" sampling the integration domain

Can be effective for very high dimensional integrals

Very simple and intuitive

But, produces a random approximation

90 / 125

Monte Carlo integration

Integrate $\xi = \int_0^1 f(x) dx$ by drawing $N$ uniform samples, $x_1,...,x_N$ over interval $[0,1]$

91 / 125

Monte Carlo integration

Integrate $\xi = \int_0^1 f(x) dx$ by drawing $N$ uniform samples, $x_1,...,x_N$ over interval $[0,1]$

$\xi$ is equivalent to $E[f(x)]$ with respect to a uniform distribution, so estimating the integral is the same as estimating the expected value of $f(x)$

91 / 125

Monte Carlo integration

Integrate $\xi = \int_0^1 f(x) dx$ by drawing $N$ uniform samples, $x_1,...,x_N$ over interval $[0,1]$

$\xi$ is equivalent to $E[f(x)]$ with respect to a uniform distribution, so estimating the integral is the same as estimating the expected value of $f(x)$

In general we have that $\hat{\xi} = V\frac{1}{N}\sum_{i=1}^N f(x_i)$

where $V$ is the volume over which we are integrating

91 / 125

Monte Carlo integration

Integrate $\xi = \int_0^1 f(x) dx$ by drawing $N$ uniform samples, $x_1,...,x_N$ over interval $[0,1]$

$\xi$ is equivalent to $E[f(x)]$ with respect to a uniform distribution, so estimating the integral is the same as estimating the expected value of $f(x)$

In general we have that $\hat{\xi} = V\frac{1}{N}\sum_{i=1}^N f(x_i)$

where $V$ is the volume over which we are integrating

LLN gives us that the $plim_{N\rightarrow\infty} \hat{\xi} = \xi$

91 / 125

Monte Carlo integration

The variance of $\hat{\xi}$ is $\sigma^2_{\hat{\xi}} = var(\frac{V}{N}\sum_{i=1}^N f(x_i)) = \frac{V^2}{N^2} \sum_{i=1}^N var(f(X)) = \frac{V^2}{N}\sigma^2_{f(X)}$

92 / 125

Monte Carlo integration

The variance of $\hat{\xi}$ is $\sigma^2_{\hat{\xi}} = var(\frac{V}{N}\sum_{i=1}^N f(x_i)) = \frac{V^2}{N^2} \sum_{i=1}^N var(f(X)) = \frac{V^2}{N}\sigma^2_{f(X)}$

So average error is $\frac{V}{\sqrt{N}}\sigma_{f(X)}$ , this gives us its rate of convergence: $O(\sqrt{N})$

Note:

The rate of convergence is independent of the dimension of x
Quasi-Monte Carlo methods can get you $O(1/N)$

92 / 125

Monte Carlo integration

Suppose we want to integrate $x^2$ from 0 to 10, we know this is $10^3/3 = 333.333$

# Package for drawing random numbers
using Distributions
# Define a function to do the integration for an arbitrary function
function integrate_function(f, lower, upper, num_draws)
  # Draw from a uniform distribution
  xs = rand(Uniform(lower, upper), num_draws)
  # Expectation = mean(x)*volume
  expectation = mean(f(xs))*(upper - lower)
end

93 / 125

Monte Carlo integration

Suppose we want to integrate $x^2$ from 0 to 10, we know this is $10^3/3 = 333.333$

# Integrate
f(x) = x.^2;
integrate_function(f, 0, 10, 1000)

## 341.72507813471555

Pretty close!

94 / 125

Quadrature rules

We can also approximate integrals using a technique called quadrature

95 / 125

Quadrature rules

We can also approximate integrals using a technique called quadrature

With quadrature we effective take weighted sums to approximate integrals

95 / 125

Quadrature rules

We can also approximate integrals using a technique called quadrature

With quadrature we effective take weighted sums to approximate integrals

We will focus on two classes of quadrature for now:

Newton-Cotes (the kind you've seen before)
Gaussian (probably new)

95 / 125

Newton-Cotes quadrature rules

Suppose we want to integrate a one dimensional function $f(x)$ over $[a,b]$

How would you do it?

96 / 125

Newton-Cotes quadrature rules

Suppose we want to integrate a one dimensional function $f(x)$ over $[a,b]$

How would you do it?

One answer is to replace the function with
something easy to integrate: a piecewise polynomial

96 / 125

Newton-Cotes quadrature rules

Suppose we want to integrate a one dimensional function $f(x)$ over $[a,b]$

How would you do it?

One answer is to replace the function with
something easy to integrate: a piecewise polynomial

Key things to define up front:

$x_i = a + (i-1)/h$ for $i=1,2,...,n$ where $h = \frac{b-a}{n-1}$

$x_i$ s are the of the approximation scheme and divide the interval into $n-1$ equally spaced subintervals of length $h$

96 / 125

Midpoint rule

Most basic Newton-Cotes method:

Split $[a,b]$ into intervals
Approximate the function in each subinterval by a constant equal to the function at the midpoint of the subinterval

97 / 125

Midpoint rule

Most basic Newton-Cotes method:

Split $[a,b]$ into intervals
Approximate the function in each subinterval by a constant equal to the function at the midpoint of the subinterval

$\int_{x_i}^{x_{i+1}} f(x) dx \approx hf(\frac{1}{2}(x_{i+1}+x_i))$

97 / 125

Midpoint rule

Most basic Newton-Cotes method:

Split $[a,b]$ into intervals
Approximate the function in each subinterval by a constant equal to the function at the midpoint of the subinterval

$\int_{x_i}^{x_{i+1}} f(x) dx \approx hf(\frac{1}{2}(x_{i+1}+x_i))$

Approximates $f$ by a step function

97 / 125

Trapezoid rule

Increase complexity by 1 degree:

Split $[a,b]$ into intervals
Approximate the function in each subinterval by a linear interpolation passing through $(x_i,f(x_i))$ and $(x_{i+1},f(x_{i+1}))$

98 / 125

Trapezoid rule

Increase complexity by 1 degree:

Split $[a,b]$ into intervals
Approximate the function in each subinterval by a linear interpolation passing through $(x_i,f(x_i))$ and $(x_{i+1},f(x_{i+1}))$

$\int_{x_i}^{x_{i+1}} f(x) dx \approx \frac{h}{2}[f(x_i) + f(x_{i+1})]$

98 / 125

Trapezoid rule

Increase complexity by 1 degree:

Split $[a,b]$ into intervals
Approximate the function in each subinterval by a linear interpolation passing through $(x_i,f(x_i))$ and $(x_{i+1},f(x_{i+1}))$

$\int_{x_i}^{x_{i+1}} f(x) dx \approx \frac{h}{2}[f(x_i) + f(x_{i+1})]$

We can aggregate this up to: $\int_{a}^{b} f(x) dx \approx \sum_{i=1}^n w_i f(x_i)$
where $w_1=w_n = h/2$ and $w_i = h$ otherwise

98 / 125

How accurate is this rule?

Trapezoid rule is $O(h^2)$ / first-order exact: it can integrate any linear function exactly

99 / 125

Simpsons rule

Increase complexity by 1 degree:

Let $n$ be odd, then approximate the function across a of subintervals by a quadratic interpolation passing through $(x_{2i-1},f(x_{2i-i}))$ , $(x_{2i},f(x_{2i}))$ , and $(x_{2i+1},f(x_{2i+1}))$

100 / 125

Simpsons rule

Increase complexity by 1 degree:

Let $n$ be odd, then approximate the function across a of subintervals by a quadratic interpolation passing through $(x_{2i-1},f(x_{2i-i}))$ , $(x_{2i},f(x_{2i}))$ , and $(x_{2i+1},f(x_{2i+1}))$

$\int_{x_i}^{x_{i+1}} f(x) dx \approx \frac{h}{3}[f(x_{2i-1}) + 4f(x_{2i}) + f(x_{2i+1})]$

100 / 125

Simpsons rule

Increase complexity by 1 degree:

Let $n$ be odd, then approximate the function across a of subintervals by a quadratic interpolation passing through $(x_{2i-1},f(x_{2i-i}))$ , $(x_{2i},f(x_{2i}))$ , and $(x_{2i+1},f(x_{2i+1}))$

$\int_{x_i}^{x_{i+1}} f(x) dx \approx \frac{h}{3}[f(x_{2i-1}) + 4f(x_{2i}) + f(x_{2i+1})]$

We can aggregate this up to: $\int_{a}^{b} f(x) dx \approx \sum_{i=1}^n w_i f(x_i)$
where $w_1=w_n = h/3$ , otherwise and $w_i = 4h/3$ if $i$ is even and $w_i = 2h/3$ if $i$ is odd

100 / 125

How accurate is this rule?

Simpson's rule is $O(h^4)$ / third-order exact: it can integrate any cubic function exactly

101 / 125

How accurate is this rule?

Simpson's rule is $O(h^4)$ / third-order exact: it can integrate any cubic function exactly

That's weird! Why do we gain 2 orders of accuracy when increasing one order of approximation complexity?

101 / 125

How accurate is this rule?

Simpson's rule is $O(h^4)$ / third-order exact: it can integrate any cubic function exactly

That's weird! Why do we gain 2 orders of accuracy when increasing one order of approximation complexity?

The approximating piecewise quadratic is exact at the end points and midpoint of the conjoined two subintervals

101 / 125

How accurate is this rule?

Simpson's rule is $O(h^4)$ / third-order exact: it can integrate any cubic function exactly

That's weird! Why do we gain 2 orders of accuracy when increasing one order of approximation complexity?

The approximating piecewise quadratic is exact at the end points and midpoint of the conjoined two subintervals
Clearly the difference between a cubic $f(x)$ and the quadratic approximation in $[x_{2i-1},x_{2i+1}]$ is another cubic function

101 / 125

How accurate is this rule?

Simpson's rule is $O(h^4)$ / third-order exact: it can integrate any cubic function exactly

That's weird! Why do we gain 2 orders of accuracy when increasing one order of approximation complexity?

The approximating piecewise quadratic is exact at the end points and midpoint of the conjoined two subintervals
Clearly the difference between a cubic $f(x)$ and the quadratic approximation in $[x_{2i-1},x_{2i+1}]$ is another cubic function
You can prove that this cubic function is with respect to the midpoint $\rightarrow$ integrating over the first subinterval cancels integrating over the second subinterval

101 / 125

Gaussian quadrature rules

How did we pick the $x_i$ quadrature nodes for Newton-Cotes rules?

102 / 125

Gaussian quadrature rules

How did we pick the $x_i$ quadrature nodes for Newton-Cotes rules?

Evenly spaced, but no particular reason for doing so...

102 / 125

Gaussian quadrature rules

How did we pick the $x_i$ quadrature nodes for Newton-Cotes rules?

Evenly spaced, but no particular reason for doing so...

Gaussian quadrature selects these nodes more efficiently
and relies on $w(x)$

102 / 125

Gaussian quadrature rules

Gaussian rules try to exactly integrate some finite dimensional collection of functions (i.e. polynomials up to some degree)

103 / 125

Gaussian quadrature rules

Gaussian rules try to exactly integrate some finite dimensional collection of functions (i.e. polynomials up to some degree)

For a given order of approximation $n$ , the weights $w_1,...,w_n$ and nodes $x_1,...,x_n$ are chosen to satisfy $2n$ moment matching conditions:

103 / 125

Gaussian quadrature rules

Gaussian rules try to exactly integrate some finite dimensional collection of functions (i.e. polynomials up to some degree)

For a given order of approximation $n$ , the weights $w_1,...,w_n$ and nodes $x_1,...,x_n$ are chosen to satisfy $2n$ moment matching conditions:

$\int_I x^kw(x)dx = \sum_{i=1}^n w_i x^k_i$ , for $k=0,...,2n-1$

where $I$ is the interval over which we are integrating
and $w(x)$ is a given weight function

103 / 125

Gaussian quadrature improves accuracy

The moment matching conditions pin down $w_i$ s and $x_i$ s so we can approximate an integral by a weighted sum of the function at the prescribed nodes

104 / 125

Gaussian quadrature improves accuracy

The moment matching conditions pin down $w_i$ s and $x_i$ s so we can approximate an integral by a weighted sum of the function at the prescribed nodes

$\int_i f(x) w(x)dx \approx \sum_{i=1}^n w_i f(x_i)$

104 / 125

Gaussian quadrature improves accuracy

The moment matching conditions pin down $w_i$ s and $x_i$ s so we can approximate an integral by a weighted sum of the function at the prescribed nodes

$\int_i f(x) w(x)dx \approx \sum_{i=1}^n w_i f(x_i)$

Gaussian rules are $2n-1$ order exact,
we can exactly compute the integral of any polynomial order $2n-1$

104 / 125

Gaussian quadrature takeaways

Gaussian quadrature effectively discretizes some distribution $p(x)$ into mass points (nodes) and probabilities (weights) for some other discrete distribution $\bar{p}(x)$

105 / 125

Gaussian quadrature takeaways

Gaussian quadrature effectively discretizes some distribution $p(x)$ into mass points (nodes) and probabilities (weights) for some other discrete distribution $\bar{p}(x)$

Given an approximation with $n$ mass points, $X$ and $\bar{X}$ have identical moments up to order $2n$ , and as $n\rightarrow\infty$ we have a continuum of mass points and recover the continuous pdf

105 / 125

Gaussian quadrature takeaways

Gaussian quadrature effectively discretizes some distribution $p(x)$ into mass points (nodes) and probabilities (weights) for some other discrete distribution $\bar{p}(x)$

Given an approximation with $n$ mass points, $X$ and $\bar{X}$ have identical moments up to order $2n$ , and as $n\rightarrow\infty$ we have a continuum of mass points and recover the continuous pdf

But what do we pick for the weighting function $w(x)$ ?

105 / 125

Gauss-Legendre

We can start out with a simple $w(x) = 1$ , this gives us Gauss-Legendre quadrature

This can approximate the integral of any function arbitrarily well by increasing $n$

106 / 125

Gauss-Laguerre

Sometimes we want to compute exponentially discounted sums like: $\int_I f(x) e^{-x} dx$

The weighting function $e^{-x}$ is Gauss-Laguerre quadrature

107 / 125

Gauss-Hermite

Sometimes we want to take expectations of normally distributed variables: $\int_I f(x) e^{-x^2} dx$

There exist packages or look-up tables to get the prescribed weights and nodes for each of these schemes

108 / 125

Linear Algebra

Lots of computational problems break down into linear systems

Many non-linear models are linearized

How do we actually solve these systems inside the machine?

109 / 125

L-U Factorization

If $A$ in $Ax=b$ is upper or lower triangular,
we can solve for $x$ recursively via forward/backward substitution

110 / 125

L-U Factorization

If $A$ in $Ax=b$ is upper or lower triangular,
we can solve for $x$ recursively via forward/backward substitution

Consider a lower triangular matrix

110 / 125

L-U Factorization

If $A$ in $Ax=b$ is upper or lower triangular,
we can solve for $x$ recursively via forward/backward substitution

Consider a lower triangular matrix

The first element is the only non-zero value in the first row so $x_1$ is easy to solve for

110 / 125

L-U Factorization

If $A$ in $Ax=b$ is upper or lower triangular,
we can solve for $x$ recursively via forward/backward substitution

Consider a lower triangular matrix

The first element is the only non-zero value in the first row so $x_1$ is easy to solve for

The equation in row 2 contains $x_2$ and the already solved for $x_1$
so we can easily solve for $x_2$ and then continue until we solve for all $x$ s

110 / 125

Forward substitution

Forward substitution gives us solutions

$x_i = \frac{1}{a_{ii}} \left( b_i - \sum_{j=1}^{i-1}a_{ij}x_j\right)$ , for all $i$

111 / 125

Forward substitution

Forward substitution gives us solutions

$x_i = \frac{1}{a_{ii}} \left( b_i - \sum_{j=1}^{i-1}a_{ij}x_j\right)$ , for all $i$

111 / 125

Forward substitution

Forward substitution gives us solutions

$x_i = \frac{1}{a_{ii}} \left( b_i - \sum_{j=1}^{i-1}a_{ij}x_j\right)$ , for all $i$

L-U factorization is an algorithm that decomposes $A$
into the product of lower and upper triangular matrices

111 / 125

L-U Factorization has two stepsFactor A into lower L and upper U triangular matrices using Gaussian eliminationWe can do this for any non-singular square matrix

112 / 125

L-U Factorization has two steps

Factor $A$ into lower $L$ and upper $U$ triangular matrices using Gaussian elimination
- We can do this for any non-singular square matrix
Solve for $x$
1. $(LU)x = b$
2. Solve for $y: Ly = b$ using forward substitution
3. Using the solved $y$ , we know $Ux=y$ and can solve with backward substitution

112 / 125

Why bother with this scheme?

Why not just use another method like Cramer's rule?

113 / 125

Why bother with this scheme?

Why not just use another method like Cramer's rule?

Speed

113 / 125

Why bother with this scheme?

Why not just use another method like Cramer's rule?

Speed

LU is less than $O(n^3)$

113 / 125

Why bother with this scheme?

Why not just use another method like Cramer's rule?

Speed

LU is less than $O(n^3)$

Cramer's rule is $O(n!\times n)$

113 / 125

Why bother with this scheme?

Why not just use another method like Cramer's rule?

Speed

LU is less than $O(n^3)$

Cramer's rule is $O(n!\times n)$

For a 10x10 system this can really matter

113 / 125

Example: LU vs Cramer

Julia description of the division operator \: If A is upper or lower triangular (or diagonal), no factorization of A is required and the system is solved with either forward or backward substitution. For non-triangular square matrices, an LU factorization is used.

So we can do LU factorization approaches to solutions by just doing x = A\b,
but we can write it ourselves as well

114 / 125

Example: LU vs Cramer

Cramer's Rule can be written as a simple loop:

function solve_cramer(A, b)
    dets = Vector(undef, length(b))
    for index in eachindex(b)
        B = copy(A)
        B[:, index] = b
        dets[index] = det(B)
    end
    return dets ./ det(A)
end

115 / 125

Example: LU vs Cramer

n = 100
A = rand(n, n)
b = rand(n)

116 / 125

Example: LU vs Cramer

n = 100
A = rand(n, n)
b = rand(n)

Let's see the full results of the competition for a 10x10:

using BenchmarkTools
cramer_time = @elapsed solve_cramer(A, b);
cramer_allocation = @allocated solve_cramer(A, b);
lu_time = @elapsed A\b;
lu_allocation = @allocated A\b;
println("Cramer's rule solved in $cramer_time seconds and used $cramer_allocation kilobytes of memory.")

## Cramer's rule solved in 0.2546825 seconds and used 16196512 kilobytes of memory.

println("LU solved in $(lu_time) seconds and used $(lu_allocation) kilobytes of memory.")

## LU solved in 0.949662062 seconds and used 81904 kilobytes of memory.

println("LU's relative speedup is $(cramer_time/lu_time) and relative decrease in memory is $(cramer_allocation/lu_allocation).")

## LU's relative speedup is 0.26818224102122756 and relative decrease in memory is 197.7499511623364.

116 / 125

Mechanics of factorizing

Gaussian elimination is where we use row operations

swapping rows
multiplying by non-zero scalars
add a scalar multiple of one row to another

117 / 125

Mechanics of factorizing

Gaussian elimination is where we use row operations

swapping rows
multiplying by non-zero scalars
add a scalar multiple of one row to another

to turn a matrix $(IA)$ into $(LU)$

117 / 125

Numerical error blow up

Small errors can have big effects, for example: $\begin{align*} \begin{bmatrix} -M^{-1} & 1 \\ 1 & 1 \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix} = \begin{bmatrix} 1\\ 2 \end{bmatrix} \end{align*}$

where $M$ is big

118 / 125

Numerical error blow up

where $M$ is big

Lets use L-U Factorization to solve it:

$\begin{align*} \begin{bmatrix} -M^{-1} & 1 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0\\ 0 & 1 \end{bmatrix} \begin{bmatrix} -M^{-1} & 1 \\ 1 & 1 \end{bmatrix} \end{align*}$

118 / 125

Numerical error blow up

Subtract $-M$ times the first row from the second to get the L-U factorization $\begin{align*} \begin{bmatrix} 1& 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} -M^{-1} & 1 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0\\ -M & 1 \end{bmatrix} \begin{bmatrix} -M^{-1} & 1\\ 0 & M+1 \end{bmatrix} \end{align*}$

119 / 125

Numerical error blow up

We can get closed-form solutions by applying forward substitution: $\begin{align*} \begin{bmatrix} x_1\\ x_2 \end{bmatrix} = \begin{bmatrix} M/(M+1)\\ (M+2)/(M+1) \end{bmatrix} \end{align*}$

119 / 125

Numerical issues

Both variables are approximately 1 for large $M$ ,
but remember adding small numbers to big numbers causes problems numerically

120 / 125

Numerical issues

Both variables are approximately 1 for large $M$ ,
but remember adding small numbers to big numbers causes problems numerically

If $M=10000000000000000000$ , the computer will return $x_2$
is equal to precisely $1$ , this isn't terribly wrong

120 / 125

Numerical issues

Both variables are approximately 1 for large $M$ ,
but remember adding small numbers to big numbers causes problems numerically

If $M=10000000000000000000$ , the computer will return $x_2$
is equal to precisely $1$ , this isn't terribly wrong

When we then perform the second step of backwards substitution, we solve for $x_1=-M(1-x_2)=0$ , this is very wrong

Large errors like this often occur because diagonal elements are very small

120 / 125

Julia example

function solve_lu(M)
    b = [1, 2]
    U = [-M^-1 1; 0 M+1]
    L = [1. 0; -M 1.]
    y = L\b
    # Round element-wise to 3 digits
    x = round.(U\y, digits = 5)
end;
true_solution(M) = round.([M/(M+1), (M+2)/(M+1)], digits = 5);

121 / 125

Julia example

## True solution for M=10 is approximately [0.90909, 1.09091], computed solution is [0.90909, 1.09091]

## True solution for M=1e10 is approximately [1.0, 1.0], computed solution is [1.0, 1.0]

## True solution for M=1e15 is approximately [1.0, 1.0], computed solution is [1.11022, 1.0]

## True solution for M=1e20 is approximately [1.0, 1.0], computed solution is [-0.0, 1.0]

## Julia's division operator is actually pretty smart though, true solution for M=1e20 is A\b = [1.0, 1.0]

122 / 125

Ill-conditioning

A matrix $A$ is said to be ill-conditioned if
a small perturbation in $b$ yields a large change in $x$

123 / 125

Ill-conditioning

A matrix $A$ is said to be ill-conditioned if
a small perturbation in $b$ yields a large change in $x$

One way to measure ill-conditioning in a matrix is the elasticity of the solution with respect to $b$ , $\begin{gather*} \sup_{||\delta b || > 0} \frac{||\delta x|| / ||x||}{||\delta b|| / ||b||} \end{gather*}$ which yields the percent change in $x$ given
a percentage point change in the magnitude of $b$

123 / 125

Ill-conditioning

If this elasticity is large, then computer representations of the system of equations can lead to large errors due to rounding

124 / 125

Ill-conditioning

If this elasticity is large, then computer representations of the system of equations can lead to large errors due to rounding

Approximate the elasticity by computing the $\begin{gather*} \kappa = ||A|| \cdot ||A^{-1}|| \end{gather*}$

124 / 125

Ill-conditioning

If this elasticity is large, then computer representations of the system of equations can lead to large errors due to rounding

Approximate the elasticity by computing the $\begin{gather*} \kappa = ||A|| \cdot ||A^{-1}|| \end{gather*}$

$\kappa$ gives the least upper bound of the elasticity

$\kappa$ is always larger than one and a rule of thumb is that for every order of magnitude, a significant digit is lost in the computation of $x$

cond([1. 1.; 1. 1.00000001])

## 4.000000065548868e8

124 / 125

Next weekCoding, generic coding, reproducibility, the shell125 / 125

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help