Necessary things to download to follow along today and in the future:
Necessary things to download to follow along today and in the future:
Also make a GitHub account
Necessary things to download to follow along today and in the future:
Also make a GitHub account
For this lecture you will need the following Julia packages
using Plots, ForwardDiff, Distributions, BenchmarkTools
You must use Julia and write .jl scripts, no Jupyter
You can work in groups of up to 3
Problem sets will be where you implement the techniques we learn in class on your own, but we will be doing our fair share of coding in class
You will be randomly assigned to do a reproduction test / code check on someone else's homework
Rubric will come later
Why am I making you do this?
Why am I making you do this?
If you want to publish in an AEA journal you need to have good practices,
other journals are following suit
Why am I making you do this?
If you want to publish in an AEA journal you need to have good practices,
other journals are following suit
Making your code reproducible and generic will make
you a better co-author to others and your future self
Everyone will present a paper starting in 2-3 weeks
The paper can apply methods we've learned about (or will learn about), or can be a new method that we have not covered
You must consult with me at least 1 week prior to your scheduled presentation date to ensure the paper is appropriate for a presentation
The final project will be the beginning of a computationally-driven research project
Proposals will be due about half way through the class
The only requirement is that the project cannot be computationally trivial
(i.e. no 600 observation two-way FE papers)
Everyone will present their final projects in the last week of class
More details on the syllabus and to come later
All slides will be available on the course GitHub page
The slides are made with R Markdown and can run Julia via the JuliaCall package, e.g.
# true coefficientbbeta = π;# random x datax = randn(100,1)*5 .+ 3;# OLS data generating processy = bbeta.*x .+ randn(100,1)*10;# OLS estimationbbeta_hat = inv(x'x)x'y;println("β-hat is $(round(bbeta_hat[1],digits=3)) and the true β is $(round(bbeta,digits=3)).")
## β-hat is 3.325 and the true β is 3.142.
Everything you've done so far has likely been solvable analytically
Including OLS: ˆβ=(X′X)−1X′Y
Not all economic models have closed-form solutions, and others can't have closed-form solutions with losing important economic content
This is generally true for dynamic models
We can use computation + theory to answer quantitative questions
We can use computation + theory to answer quantitative questions
Theory can't give us welfare in dollar terms
Theory can't tell us the value of economic primitives
Theory often relies on strong assumptions like
It can be unclear what the cost of these assumptions are
Suppose we have a constant elasticity demand function: q(p)=p−0.2
In equilibrium, quantity demanded is q∗=2
What price clears the market in equilibrium?
Suppose we have a constant elasticity demand function: q(p)=p−0.2
In equilibrium, quantity demanded is q∗=2
What price clears the market in equilibrium?
Just invert the demand function:
2=p−0.2
Suppose we have a constant elasticity demand function: q(p)=p−0.2
In equilibrium, quantity demanded is q∗=2
What price clears the market in equilibrium?
Just invert the demand function:
2=p−0.2
p∗=2−5✓
Your calculator can do the rest
Suppose the demand function is now: q(p)=0.5p−0.2+0.5p−0.5, a weighted average of two CE demand functions
What price clears the market if q∗=2?
Suppose the demand function is now: q(p)=0.5p−0.2+0.5p−0.5, a weighted average of two CE demand functions
What price clears the market if q∗=2?
First, does a solution exist?
Suppose the demand function is now: q(p)=0.5p−0.2+0.5p−0.5, a weighted average of two CE demand functions
What price clears the market if q∗=2?
First, does a solution exist?
Yes, why?
Suppose the demand function is now: q(p)=0.5p−0.2+0.5p−0.5, a weighted average of two CE demand functions
What price clears the market if q∗=2?
First, does a solution exist?
Yes, why?
q(p) is monotonically increasing
Suppose the demand function is now: q(p)=0.5p−0.2+0.5p−0.5, a weighted average of two CE demand functions
What price clears the market if q∗=2?
First, does a solution exist?
Yes, why?
q(p) is monotonically increasing
q(p) is less than 2 at p=0.1 and greater than 2 at p=0.2
Suppose the demand function is now: q(p)=0.5p−0.2+0.5p−0.5, a weighted average of two CE demand functions
What price clears the market if q∗=2?
First, does a solution exist?
Yes, why?
q(p) is monotonically increasing
q(p) is less than 2 at p=0.1 and greater than 2 at p=0.2
→ by intermediate value theorem q(p)=2 somewhere in (0.1,0.2)
# We know solution is between .1 and .2x = collect(range(.1, stop=.2, length=10)) # generate evenly spaced gridq_d = ones(size(x)).*2 # generate equal length vector of qd=2# Price functionprice(p) = p.^(-0.2)/2 .+ p.^(-0.5)/2# Get corresponding quantity values at these pricesy = price(x)
Now plot qd and q(p)
Notice: if we let t=p−0.1 then:
q(t)=0.5t2+0.5t5
Notice: if we let t=p−0.1 then:
q(t)=0.5t2+0.5t5
Can we solve for t now?
Notice: if we let t=p−0.1 then:
q(t)=0.5t2+0.5t5
Can we solve for t now?
No! Closed-form solutions to fifth order polynomials are not guaranteed to exist!
Notice: if we let t=p−0.1 then:
q(t)=0.5t2+0.5t5
Can we solve for t now?
No! Closed-form solutions to fifth order polynomials are not guaranteed to exist!
So how do we solve the problem?
Iteratively do the following:
We will learn more about this and why it works in a later class
# Define demand functionsdemand(p) = p^(-0.2)/2 + p^(-0.5)/2 - 2 # quantity minus pricedemand_grad(p) = .1*p^(-1.2) + .25*p^(-1.5) # demand gradient
# Define demand functionsdemand(p) = p^(-0.2)/2 + p^(-0.5)/2 - 2 # quantity minus pricedemand_grad(p) = .1*p^(-1.2) + .25*p^(-1.5) # demand gradient
function find_root_newton(demand, demand_grad) p = .3 # initial guess deltap = 1e10 # initialize stepsize while abs(deltap) > 1e-4 deltap = demand(p)/demand_grad(p) p += deltap println("Intermediate guess of p = $(round(p,digits=3)).") end println("The solution is p = $(round(p,digits=3)).") return pend;
# Solve for pricefind_root_newton(demand, demand_grad)
## Intermediate guess of p = 0.068.## Intermediate guess of p = 0.115.## Intermediate guess of p = 0.147.## Intermediate guess of p = 0.154.## Intermediate guess of p = 0.154.## Intermediate guess of p = 0.154.## The solution is p = 0.154.
## 0.15419764093200633
Consider a two period ag commodity market model
Period 1: Farmer makes acreage decisions for planting
Period 2: Per-acre yield realizes, equilibrium crop price clears the market
Consider a two period ag commodity market model
Period 1: Farmer makes acreage decisions for planting
Period 2: Per-acre yield realizes, equilibrium crop price clears the market
The farmer's policy function is: a(E[p])=12+12E[p]
Consider a two period ag commodity market model
Period 1: Farmer makes acreage decisions for planting
Period 2: Per-acre yield realizes, equilibrium crop price clears the market
The farmer's policy function is: a(E[p])=12+12E[p]
After planting, yield ˆy realizes, producing a total quantity q=aˆy of the crop
Consider a two period ag commodity market model
Period 1: Farmer makes acreage decisions for planting
Period 2: Per-acre yield realizes, equilibrium crop price clears the market
The farmer's policy function is: a(E[p])=12+12E[p]
After planting, yield ˆy realizes, producing a total quantity q=aˆy of the crop
Demand is given by p(q)=3−2q
Yield is given by ˆy∼N(1,0.1)
p(ˆy)=3−2aˆy
p(ˆy)=3−2aˆy
a=12+12(3−2aE[ˆy])
p(ˆy)=3−2aˆy
a=12+12(3−2aE[ˆy])
Rearrange and solve:
a∗=1
p(ˆy)=3−2aˆy
a=12+12(3−2aE[ˆy])
Rearrange and solve:
a∗=1
Now suppose the government implements a price floor
on the crop of p>1 so we have that p(ˆy)=max
How much acreage does the farmer plant?
This is analytically intractable
This is analytically intractable
The max operator is non-linear so we can't pass the expectation through
E[\max(1,3-2a\hat{y})] \neq \max(1,E[3-2a\hat{y}])
This is analytically intractable
The max operator is non-linear so we can't pass the expectation through
E[\max(1,3-2a\hat{y})] \neq \max(1,E[3-2a\hat{y}])
\rightarrow we need to solve this numerically
We can solve this using another technique called function iteration
# Function iteration method to find a rootfunction find_root_fi(mn, variance) y = randn(1000)*sqrt(variance) .+ mn # draws of the random variable a = 1. # initial guess differ = 100. # initialize error exp_price = 1. # initialize expected price while differ > 1e-4 a_old = a # save old acreage p = max.(1, 3 .- 2 .*a.*y) # compute price at all distribution points exp_price = mean(p) # compute expected price a = 1/2 + 1/2*exp_price # get new acreage planted given new price differ= abs(a - a_old) # change in acreage planted println("Intermediate acreage guess: $(round(a,digits=3))") end return a, exp_priceend
acreage, expected_price = find_root_fi(1, 0.1)
## Intermediate acreage guess: 1.125## Intermediate acreage guess: 1.086## Intermediate acreage guess: 1.097## Intermediate acreage guess: 1.094## Intermediate acreage guess: 1.095## Intermediate acreage guess: 1.094## Intermediate acreage guess: 1.094
## (1.0944707255507848, 1.1889414511015697)
println("The optimal number of acres to plant is $(round(acreage, digits = 3)).")
## The optimal number of acres to plant is 1.094.
println("The expected price is $(round(expected_price, digits = 3)).")
## The expected price is 1.189.
How do we quantify speed and accuracy of computational algorithms?
i.e. what is the computational complexity of the problem?
How do we quantify speed and accuracy of computational algorithms?
i.e. what is the computational complexity of the problem?
General mathematical definition: Big O notation describes the limiting behavior of a function when the argument tends towards a particular value or infinity
How do we quantify speed and accuracy of computational algorithms?
i.e. what is the computational complexity of the problem?
General mathematical definition: Big O notation describes the limiting behavior of a function when the argument tends towards a particular value or infinity
Programming context: Describes the limiting behavior of algorithms in terms of run time/memory/accuracy as input size grows
How do we quantify speed and accuracy of computational algorithms?
i.e. what is the computational complexity of the problem?
General mathematical definition: Big O notation describes the limiting behavior of a function when the argument tends towards a particular value or infinity
Programming context: Describes the limiting behavior of algorithms in terms of run time/memory/accuracy as input size grows
You've seen this before in the expression of Taylor series' errors
Written as: O(F(x))
Here is how to think about it:
Written as: O(F(x))
Here is how to think about it:
O(x): linear
Written as: O(F(x))
Here is how to think about it:
O(x): linear
Examples?
Written as: O(F(x))
Here is how to think about it:
O(x): linear
Examples?
Time to find a particular (e.g. maximum) value in an unsorted array
\rightarrow For each element, check whether it is the value we want
\mathbf{O(c^n)}: exponential
\mathbf{O(c^n)}: exponential
Examples?
\mathbf{O(c^n)}: exponential
Examples?
Time to solve a standard dynamic program, ex traveling salesman
\rightarrow For each city i=1,...,n, solve a Bellman as a function of all other cities
\mathbf{O(n!)}: factorial
\mathbf{O(n!)}: factorial
Examples?
\mathbf{O(n!)}: factorial
Examples?
Solving traveling salesman by brute force
\rightarrow Obtain travel time for all possible combinations of intermediate cities
This is how you have probably seen Big O used before:
This is how you have probably seen Big O used before:
Taylor series for sin(x) around zero:
sin(x) \approx x - x^3/3! + x^5/5! + O(x^7)
What does O(x^7) mean here?
This is how you have probably seen Big O used before:
Taylor series for sin(x) around zero:
sin(x) \approx x - x^3/3! + x^5/5! + O(x^7)
What does O(x^7) mean here?
As we move away from 0 to some x, the upper bound of the growth rate in the error of our approximation to sin(x) is x^7
We are approximating about zero so x is small and x^n is decreasing in n
This is how you have probably seen Big O used before:
Taylor series for sin(x) around zero:
sin(x) \approx x - x^3/3! + x^5/5! + O(x^7)
What does O(x^7) mean here?
As we move away from 0 to some x, the upper bound of the growth rate in the error of our approximation to sin(x) is x^7
We are approximating about zero so x is small and x^n is decreasing in n
For small x, higher order polynomials mean the error will grow slower and we have a better local approximation
# fifth and third order Taylor approximationssin_error_5(x) = sin(x) - (x - x^3/6 + x^5/120)sin_error_3(x) = sin(x) - (x - x^3/6)
# fifth and third order Taylor approximationssin_error_5(x) = sin(x) - (x - x^3/6 + x^5/120)sin_error_3(x) = sin(x) - (x - x^3/6)
## Error of fifth-order approximation at x = .001 is: 0.0
## Error of third-order approximation at x = .001 is: 8.239936510889834e-18
## Error of fifth-order approximation at x = .01 is: -1.734723475976807e-18
## Error of third-order approximation at x = .01 is: 8.333316675601665e-13
## Error of fifth-order approximation at x = .1 is: -1.983851971587569e-11
## Error of third-order approximation at x = .1 is: 8.331349481138783e-8
Here are a few examples for fundamental computational methods
O(1): algorithm executes in constant time
The size of the input does not affect execution speed
Example: accessing a specific location in an array
O(x): algorithm executes in linear time
Execution speed grows linearly in input size
Example: inserting an element into an arbitrary location in a 1 dimensional array
Bigger array \rightarrow need to shift around more elements in memory to accomodate the new element
\mathbf{O(x^2):} algorithm executes in quadratic time
More generally called polynomial time for x^n
Execution speed grows quadratically in input size
Example: bubble sort, step through a list, compare adjacent elements, swap if in the wrong order
Question: which numbers can be represented by a computer?
Question: which numbers can be represented by a computer?
Before the answer, how are numbers physically represented by a computer?
Question: which numbers can be represented by a computer?
Before the answer, how are numbers physically represented by a computer?
Binary: a base 2 number system
Each digit can only take on 0 or 1
Question: which numbers can be represented by a computer?
Before the answer, how are numbers physically represented by a computer?
Binary: a base 2 number system
Each digit can only take on 0 or 1
Base 10: each digit can take on 0-9
Question: which numbers can be represented by a computer?
Question: which numbers can be represented by a computer?
Answer: a subset of the rational numbers
Question: which numbers can be represented by a computer?
Answer: a subset of the rational numbers
Computers have finite memory and hard disk space, there are infinite rational numbers
Question: which numbers can be represented by a computer?
Answer: a subset of the rational numbers
Computers have finite memory and hard disk space, there are infinite rational numbers
This imposes a strict limitation on the storage of numbers
Question: which numbers can be represented by a computer?
Answer: a subset of the rational numbers
Computers have finite memory and hard disk space, there are infinite rational numbers
This imposes a strict limitation on the storage of numbers
Numbers are stored as: \pm mb^{\pm n}
m is the mantissa/significand, b is the base, n is the exponent
All three are integers
\pm mb^{\pm n}
The significand typically gives the significant digits
The exponent scales the number up or down in magnitude
The size of numbers a computer can represent is limited
by how much space is typically allocated for a real number
The size of numbers a computer can represent is limited
by how much space is typically allocated for a real number
Space allocations are usually 64 bits: 53 for m and 11 for n
println(typeof(5.0))
## Float64
println(typeof(5))
## Int64
Int64
means it is a integer with 64 bits of storage
Float64
means it is a floating point number with 64 bits of storage
Floating point just means b^{\pm n} can move the decimal point around in the significand
Int64
and Float64
are different, this will be important later
Limitations on storage suggest three facts
Limitations on storage suggest three facts
1. There exists a machine epsilon which denotes the smallest relative quantity representible by a computer
Machine epsilon is the smallest \epsilon such that a machine can always distinguish N+\epsilon > N > N-\epsilon
println("Machine epsilon ϵ is $(eps(Float64))")
## Machine epsilon ϵ is 2.220446049250313e-16
println("Is 1 + ϵ/2 > 1? $(1 + eps(Float64)/2 > 1)")
## Is 1 + ϵ/2 > 1? false
println("Is 1 - ϵ/2 < 1? $(1 - eps(Float64)/2 < 1)")
## Is 1 - ϵ/2 < 1? true
println("The smallest representable number larger than 1.0 is $(nextfloat(1.0))")
## The smallest representable number larger than 1.0 is 1.0000000000000002
println("The largest representable number smaller than 1.0 is $(prevfloat(1.0))")
## The largest representable number smaller than 1.0 is 0.9999999999999999
Machine epsilon changes depending on the amount of storage allocated
Machine epsilon changes depending on the amount of storage allocated
println("32 bit machine epsilon is $(eps(Float32))")
## 32 bit machine epsilon is 1.1920929e-7
println("Is 1 + ϵ/2 > 1? $(Float32(1) + eps(Float32)/2 > 1)")
## Is 1 + ϵ/2 > 1? false
println("Is 1 - ϵ/2 < 1? $(Float32(1) - eps(Float32)/2 < 1)")
## Is 1 - ϵ/2 < 1? true
Machine epsilon changes depending on the amount of storage allocated
println("32 bit machine epsilon is $(eps(Float32))")
## 32 bit machine epsilon is 1.1920929e-7
println("Is 1 + ϵ/2 > 1? $(Float32(1) + eps(Float32)/2 > 1)")
## Is 1 + ϵ/2 > 1? false
println("Is 1 - ϵ/2 < 1? $(Float32(1) - eps(Float32)/2 < 1)")
## Is 1 - ϵ/2 < 1? true
This means theres a tradeoff between precision and storage requirements,
this matters for low-memory systems like GPUs
2. There is a smallest representable number
println("64 bit smallest float is $(floatmin(Float64))")
## 64 bit smallest float is 2.2250738585072014e-308
println("32 bit smallest float is $(floatmin(Float32))")
## 32 bit smallest float is 1.1754944e-38
println("16 bit smallest float is $(floatmin(Float16))")
## 16 bit smallest float is 6.104e-5
3. There is a largest representable number
println("64 bit largest float is $(floatmax(Float64))")
## 64 bit largest float is 1.7976931348623157e308
println("32 bit largest float is $(floatmax(Float32))")
## 32 bit largest float is 3.4028235e38
println("16 bit largest float is $(floatmax(Float16))")
## 16 bit largest float is 6.55e4
We can overcome these issues (if they ever arise) with arbitrary precision arithmetic
We can overcome these issues (if they ever arise) with arbitrary precision arithmetic
println("The largest 64 bit integer is $(typemax(Int64))")
## The largest 64 bit integer is 9223372036854775807
println("Add one to it and we get: $(typemax(Int64)+1)")
## Add one to it and we get: -9223372036854775808
println("It loops us around the number line: $(typemin(Int64))")
## It loops us around the number line: -9223372036854775808
println("With arbitrary precision we can make it much bigger: $(BigInt(typemax(Int64))*1000000)")
## With arbitrary precision we can make it much bigger: 9223372036854775807000000
The scale of your problem matters
If a parameter or variable is > floatmax or < floatmin, you will have a very bad time
Scale numbers appropriately (e.g. millions of dollars, not millionths of cents)
We can only represent a finite number of numbers
This means we will have error in our computations
Error comes in two major forms:
We will always need to round numbers to the nearest computer representable number, this introduces error
println("Half of π is: $(π/2)")
## Half of π is: 1.5707963267948966
We will always need to round numbers to the nearest computer representable number, this introduces error
println("Half of π is: $(π/2)")
## Half of π is: 1.5707963267948966
The computer gave us a rational number, but \pi/2 should be irrational
Lots of important numbers are defined by infinite sums e^x = \sum_{n=0}^\infty \frac{x^n}{n!}
Lots of important numbers are defined by infinite sums e^x = \sum_{n=0}^\infty \frac{x^n}{n!}
It turns out that computers can't add up infinitely many terms because there is finite space
\rightarrow we need to truncate the sum
Errors are small, who cares?
Errors are small, who cares?
You should!
Because errors can propagate and grow as you keep applying an algorithm (e.g. function iteration)
Consider a simple quadratic: x^2-26x+1=0, the solution is: x = 13-\sqrt{168}
Consider a simple quadratic: x^2-26x+1=0, the solution is: x = 13-\sqrt{168}
println("Arbitrary precision: 13 - √168 = $(BigFloat(13-sqrt(168)))")
## Arbitrary precision: 13 - √168 = 0.03851860318427924312345567159354686737060546875
println("64 bit: 13 - √168 = $(13-sqrt(168))")
## 64 bit: 13 - √168 = 0.03851860318427924
println("32 bit: 13 - √168 = $(convert(Float32,13-sqrt(168)))")
## 32 bit: 13 - √168 = 0.038518604
println("16 bit: 13 - √168 = $(convert(Float16,13-sqrt(168)))")
## 16 bit: 13 - √168 = 0.0385
Lets add and subtract some numbers and play around with the associative property of real numbers:
x = (10^{-20} + 1) - 1 y = 10^{-20} + (1 - 1)
Lets add and subtract some numbers and play around with the associative property of real numbers:
x = (10^{-20} + 1) - 1 y = 10^{-20} + (1 - 1)
Very clearly we should get x=y, but do we? Let's find out
x = (1e-20 + 1) - 1 # initialize xy = 1e-20 + (1 - 1) # initialize yx_equals_y = (x == y) # store boolean of whether x == y
if x_equals_y println("X equals Y!")else println("X does not equal Y!") println("The difference is: $(x-y).")end
## X does not equal Y!## The difference is: -1.0e-20.
The two numbers were not equal, we got y > x
Why?
The two numbers were not equal, we got y > x
Why?
Adding numbers of greatly different magnitudes
does not always work like you would want
x = (1e-20 + 1) - 1 # initialize x
## 0.0
y = 1e-20 + (1 - 1) # initialize y
## 1.0e-20
println("x is $x")
## x is 0.0
println("y is $y")
## y is 1.0e-20
x = (1e-20 + 1) - 1 # initialize x
## 0.0
y = 1e-20 + (1 - 1) # initialize y
## 1.0e-20
println("x is $x")
## x is 0.0
println("y is $y")
## y is 1.0e-20
When we added 10^{-20} to 1, it got rounded away
Lets just subtract two numbers: 100000.2 - 100000.1
We know the answer is: 0.1
Lets just subtract two numbers: 100000.2 - 100000.1
We know the answer is: 0.1
println("100000.2 - 100000.1 is: $(100000.2 - 100000.1)")
## 100000.2 - 100000.1 is: 0.09999999999126885
if (100000.2 - 100000.1) == 0.1 println("and it is equal to 0.1")else println("and it is not equal to 0.1")end
## and it is not equal to 0.1
Why do we get this error?
Why do we get this error?
Neither of the two numbers can be precisely represented by the machine!
100000.1 \approx 8589935450993459\times 2^{-33} = 100000.0999999999767169356346130 100000.2 \approx 8589936309986918\times 2^{-33} = 100000.1999999999534338712692261
So their difference won't necessarily be 0.1
There are tools for approximate equality
isapprox(100000.2 - 100000.1, 0.1)
## true
Derivatives are obviously important in economics for finding optimal allocations, etc
The formal definition of a derivative is:
Derivatives are obviously important in economics for finding optimal allocations, etc
The formal definition of a derivative is:
\frac{d f(x)}{dx} = \lim_{h\rightarrow 0} \frac{f(x+h)-f(x)}{h}
Derivatives are obviously important in economics for finding optimal allocations, etc
The formal definition of a derivative is:
\frac{d f(x)}{dx} = \lim_{h\rightarrow 0} \frac{f(x+h)-f(x)}{h}
But we can let t = 1/h and reframe this as an infinite limit
Derivatives are obviously important in economics for finding optimal allocations, etc
The formal definition of a derivative is:
\frac{d f(x)}{dx} = \lim_{h\rightarrow 0} \frac{f(x+h)-f(x)}{h}
But we can let t = 1/h and reframe this as an infinite limit
\frac{d f(x)}{dx} = \lim_{t\rightarrow \infty} \frac{f(x+1/t)-f(x)}{1/t}
which we know a computer can't handle because of finite space to store t
How do we perform derivatives on computers if we can't take the limit?
How do we perform derivatives on computers if we can't take the limit?
Finite difference methods
How do we perform derivatives on computers if we can't take the limit?
Finite difference methods
How do we perform derivatives on computers if we can't take the limit?
Finite difference methods
What does a finite difference approximation look like?
The forward difference looks exactly like the formal definition without the limit: \frac{d f(x)}{dx} \approx \frac{f(x+h)-f(x)}{h}
The forward difference looks exactly like the formal definition without the limit: \frac{d f(x)}{dx} \approx \frac{f(x+h)-f(x)}{h}
Works the same for partial derivatives: \frac{\partial g(x,y)}{\partial x} \approx \frac{g(x+h,y)-g(x,y)}{h}
Let's see how it works in practice by calculating derivatives of x^2 at x=2
deriv_x_squared(h,x) = ((x+h)^2 - x^2)/h # derivative function
deriv_x_squared(h,x) = ((x+h)^2 - x^2)/h # derivative function
println("The deriviative with h=1e-8 is: $(deriv_x_squared(1e-8,2.))")
## The deriviative with h=1e-8 is: 3.999999975690116
println("The deriviative with h=1e-12 is: $(deriv_x_squared(1e-12,2.))")
## The deriviative with h=1e-12 is: 4.000355602329364
println("The deriviative with h=1e-30 is: $(deriv_x_squared(1e-30,2.))")
## The deriviative with h=1e-30 is: 0.0
println("The deriviative with h=1e-1 is: $(deriv_x_squared(1e-1,2.))")
## The deriviative with h=1e-1 is: 4.100000000000001
None of the values we chose for h were perfect,
but clearly some were better than others
None of the values we chose for h were perfect,
but clearly some were better than others
Why?
None of the values we chose for h were perfect,
but clearly some were better than others
Why?
We face two opposing forces:
None of the values we chose for h were perfect,
but clearly some were better than others
Why?
We face two opposing forces:
None of the values we chose for h were perfect,
but clearly some were better than others
Why?
We face two opposing forces:
We want h to be as small as possible so that
we can approximate the limit as well as we possibly can, BUT
If h is small then f(x+h) is close to f(x),
we can run into rounding issues like we saw for h=10^{-30}
None of the values we chose for h were perfect,
but clearly some were better than others
Why?
We face two opposing forces:
We want h to be as small as possible so that
we can approximate the limit as well as we possibly can, BUT
If h is small then f(x+h) is close to f(x),
we can run into rounding issues like we saw for h=10^{-30}
We can select h in an optimal fashion: h = \max\{|x|,1\}\sqrt{\epsilon}
Can we measure the error growth rate in h (i.e. Big O notation)?
Can we measure the error growth rate in h (i.e. Big O notation)?
Perform a first-order taylor expansion of f(x) around x:
Can we measure the error growth rate in h (i.e. Big O notation)?
Perform a first-order taylor expansion of f(x) around x:
f(x+h) = f(x) + f'(x)h + O(h^2)
Recall O(h^2) means the error in our approximation grows quadratically in h,
we only did a linear approximation
Can we measure the error growth rate in h (i.e. Big O notation)?
Perform a first-order taylor expansion of f(x) around x:
f(x+h) = f(x) + f'(x)h + O(h^2)
Recall O(h^2) means the error in our approximation grows quadratically in h,
we only did a linear approximation
How can we use this to understand the error in our finite difference approximation?
Rearrange to obtain: f'(x) = \frac{f(x+h) - f(x)}{h} + O(h^2)/h
Rearrange to obtain: f'(x) = \frac{f(x+h) - f(x)}{h} + O(h^2)/h
Forward differences have linearly growing errors
f'(x) = \frac{f(x+h) - f(x)}{h} + O(h) because O(h^2)/h = O(h)
If we halve h, we halve the error in our approximation
(ignoring rounding/truncation issues)
How can we improve the accuracy of the forward difference?
How can we improve the accuracy of the forward difference?
First, why do we have error?
How can we improve the accuracy of the forward difference?
First, why do we have error?
Because we are approximating the slope of a tangent curve at x
by a secant curve passing through (x,x+h)
The secant curve has the average slope of f(x) on [x,x+h]
How can we improve the accuracy of the forward difference?
First, why do we have error?
Because we are approximating the slope of a tangent curve at x
by a secant curve passing through (x,x+h)
The secant curve has the average slope of f(x) on [x,x+h]
We want the derivative at x, which is on the edge of [x,x+h], how about we center x?
We can approximate f'(x) in a slightly different way: f'(x) \approx \frac{f(x+h)-f(x-h)}{2h}
We can approximate f'(x) in a slightly different way: f'(x) \approx \frac{f(x+h)-f(x-h)}{2h}
This leaves x in the middle of the interval
over which we are averaging the slope of f(x)
We can approximate f'(x) in a slightly different way: f'(x) \approx \frac{f(x+h)-f(x-h)}{2h}
This leaves x in the middle of the interval
over which we are averaging the slope of f(x)
Is this an improvement on forward differences?
Lets do two second-order Taylor expansions:
Lets do two second-order Taylor expansions:
Subtract the two expressions (note that O(h^3) - O(h^3) = O(h^3))
and then divide by 2h to get
Lets do two second-order Taylor expansions:
Subtract the two expressions (note that O(h^3) - O(h^3) = O(h^3))
and then divide by 2h to get
f'(x) = \frac{f(x+h)-f(x-h)}{2h} + O(h^2)
f'(x) = \frac{f(x+h)-f(x-h)}{2h} + O(h^2)
f'(x) = \frac{f(x+h)-f(x-h)}{2h} + O(h^2) Error falls quadratically in h, if we halve h we reduce error by 75%
f'(x) = \frac{f(x+h)-f(x-h)}{2h} + O(h^2) Error falls quadratically in h, if we halve h we reduce error by 75%
Optimal selection of h for central differences is h = \max\{|x|,1\}\epsilon^{1/3}
Suppose we're computing a Jacobian of a multidimensional function,
why would we ever use forward differences instead of central differences?
Suppose we're computing a Jacobian of a multidimensional function,
why would we ever use forward differences instead of central differences?
For each central difference:
Suppose we're computing a Jacobian of a multidimensional function,
why would we ever use forward differences instead of central differences?
For each central difference:
We need to compute g(x_1-h,x_2,...) and g(x_1+h,x_2,...) for each x_i
Suppose we're computing a Jacobian of a multidimensional function,
why would we ever use forward differences instead of central differences?
For each central difference:
We need to compute g(x_1-h,x_2,...) and g(x_1+h,x_2,...) for each x_i
But for a forward difference we only need to compute g(x_1,x_2,...) once
and then g(x_1+h,x_2,...) for each x_i
Suppose we're computing a Jacobian of a multidimensional function,
why would we ever use forward differences instead of central differences?
For each central difference:
We need to compute g(x_1-h,x_2,...) and g(x_1+h,x_2,...) for each x_i
But for a forward difference we only need to compute g(x_1,x_2,...) once
and then g(x_1+h,x_2,...) for each x_i
Forward differences saves on computing time
and operations at the expense of accuracy
For high dimensional functions it may be worth the tradeoff
We can use these techniques to approximate higher order derivatives
We can use these techniques to approximate higher order derivatives
For example, take two third order Taylor expansions
We can use these techniques to approximate higher order derivatives
For example, take two third order Taylor expansions
We can use these techniques to approximate higher order derivatives
For example, take two third order Taylor expansions
Add the two expressions and then divide by h^2 to get
We can use these techniques to approximate higher order derivatives
For example, take two third order Taylor expansions
Add the two expressions and then divide by h^2 to get
f''(x) = \frac{f(x+h) - 2f(x) + f(x-h)}{h^2} + O(h^2)
We can use these techniques to approximate higher order derivatives
For example, take two third order Taylor expansions
Add the two expressions and then divide by h^2 to get
f''(x) = \frac{f(x+h) - 2f(x) + f(x-h)}{h^2} + O(h^2)
Second derivatives are important for calculating Hessians
and checking maxima or minima
Finite differences put us in between two opposing forces on the size of h
Finite differences put us in between two opposing forces on the size of h
Can we improve upon finite differences?
Finite differences put us in between two opposing forces on the size of h
Can we improve upon finite differences?
Analytic derivatives
One way is to code up the actual derivative
Finite differences put us in between two opposing forces on the size of h
Can we improve upon finite differences?
Analytic derivatives
One way is to code up the actual derivative
deriv_x_squared(x) = 2x
## The deriviative is: 4.0
Finite differences put us in between two opposing forces on the size of h
Can we improve upon finite differences?
Analytic derivatives
One way is to code up the actual derivative
deriv_x_squared(x) = 2x
## The deriviative is: 4.0
Exact solution!
Coding up analytic derivatives by hand for
complex problems is not always great because
Coding up analytic derivatives by hand for
complex problems is not always great because
It can take A LOT of programmer time, more than it is worth
Coding up analytic derivatives by hand for
complex problems is not always great because
It can take A LOT of programmer time, more than it is worth
Humans are suseptible to error in coding or calculating the derivative mathematically
Think about this: your code is always made up of simple arithmetic operations
Think about this: your code is always made up of simple arithmetic operations
The closed form derivatives of these operations is not hard, it turns out your computer can do it and yield exact solutions
How?
How?
There are methods that basically apply a giant chain rule to your whole program, and break down the derivative into the (easy) component parts that another package knows how to handle
How?
There are methods that basically apply a giant chain rule to your whole program, and break down the derivative into the (easy) component parts that another package knows how to handle
# the function, it needs to be written in a particular way because the autodiff package is dumb# *(x,y) is the julia function way to do x*y, # autodiff needs to be in terms of julia functions to work correctly ¯\_(ツ)_/¯ff(x) = *(x[1],x[1]) # x^2x = [2.]; # location to evaluate: ff(x) = 2^2
using ForwardDiffg(f,x) = ForwardDiff.gradient(f,x); # g = ∇f
## g (generic function with 1 method)
println("ff'(x) at $(x[1]) is: $(g(ff,x)[1])") # display gradient value
## ff'(x) at 2.0 is: 4.0
using ForwardDiffg(f,x) = ForwardDiff.gradient(f,x); # g = ∇f
## g (generic function with 1 method)
println("ff'(x) at $(x[1]) is: $(g(ff,x)[1])") # display gradient value
## ff'(x) at 2.0 is: 4.0
Exact solution!
Once you get the hang of coding up function for autodiff it's not that hard
fff(x) = sin(*(x[1],x[1])) # x^2
Once you get the hang of coding up function for autodiff it's not that hard
fff(x) = sin(*(x[1],x[1])) # x^2
x = [2.] # location to evaluate: ff(x) = 2^2
## 1-element Array{Float64,1}:## 2.0
g(f,x) = ForwardDiff.gradient(f,x) # g = ∇f
## g (generic function with 1 method)
println("fff'(x) at $(x[1]) is: $(g(fff,x)[1])") # display gradient value
## fff'(x) at 2.0 is: -2.6145744834544478
We integrate to do a lot of stuff in economics
We integrate to do a lot of stuff in economics
We integrate to do a lot of stuff in economics
\int_D f(x) dx, f:\mathcal{R}^n-\mathcal{R}, D\subset\mathcal{R}^n
Integrals are effectively infinite sums
Integrals are effectively infinite sums
1 dimensional example:
\lim_{dx_i\rightarrow0}\sum_{i=0}^{(a-b)/dx_i} f(x_i) dx_i
where dx_i is some subset of [a,b] and x_i is some evaluation point (e.g. midpoint of dx_i)
Just like derivatives, we face an infinite limit as (a-b)/dx_i \rightarrow \infty
We avoid this issue in the same way as derivatives, we replace the infinite sum with something we can handle
Probably the most commonly used form in empirical econ
Probably the most commonly used form in empirical econ
Approximate an integral by relying on LLN
and "randomly" sampling the integration domain
Probably the most commonly used form in empirical econ
Approximate an integral by relying on LLN
and "randomly" sampling the integration domain
Can be effective for very high dimensional integrals
Very simple and intuitive
But, produces a random approximation
Integrate \xi = \int_0^1 f(x) dx by drawing N uniform samples, x_1,...,x_N over interval [0,1]
Integrate \xi = \int_0^1 f(x) dx by drawing N uniform samples, x_1,...,x_N over interval [0,1]
\xi is equivalent to E[f(x)] with respect to a uniform distribution, so estimating the integral is the same as estimating the expected value of f(x)
Integrate \xi = \int_0^1 f(x) dx by drawing N uniform samples, x_1,...,x_N over interval [0,1]
\xi is equivalent to E[f(x)] with respect to a uniform distribution, so estimating the integral is the same as estimating the expected value of f(x)
In general we have that \hat{\xi} = V\frac{1}{N}\sum_{i=1}^N f(x_i)
where V is the volume over which we are integrating
Integrate \xi = \int_0^1 f(x) dx by drawing N uniform samples, x_1,...,x_N over interval [0,1]
\xi is equivalent to E[f(x)] with respect to a uniform distribution, so estimating the integral is the same as estimating the expected value of f(x)
In general we have that \hat{\xi} = V\frac{1}{N}\sum_{i=1}^N f(x_i)
where V is the volume over which we are integrating
LLN gives us that the plim_{N\rightarrow\infty} \hat{\xi} = \xi
The variance of \hat{\xi} is \sigma^2_{\hat{\xi}} = var(\frac{V}{N}\sum_{i=1}^N f(x_i)) = \frac{V^2}{N^2} \sum_{i=1}^N var(f(X)) = \frac{V^2}{N}\sigma^2_{f(X)}
The variance of \hat{\xi} is \sigma^2_{\hat{\xi}} = var(\frac{V}{N}\sum_{i=1}^N f(x_i)) = \frac{V^2}{N^2} \sum_{i=1}^N var(f(X)) = \frac{V^2}{N}\sigma^2_{f(X)}
So average error is \frac{V}{\sqrt{N}}\sigma_{f(X)}, this gives us its rate of convergence: O(\sqrt{N})
Note:
Suppose we want to integrate x^2 from 0 to 10, we know this is 10^3/3 = 333.333
# Package for drawing random numbersusing Distributions# Define a function to do the integration for an arbitrary functionfunction integrate_function(f, lower, upper, num_draws) # Draw from a uniform distribution xs = rand(Uniform(lower, upper), num_draws) # Expectation = mean(x)*volume expectation = mean(f(xs))*(upper - lower)end
Suppose we want to integrate x^2 from 0 to 10, we know this is 10^3/3 = 333.333
# Integratef(x) = x.^2;integrate_function(f, 0, 10, 1000)
## 341.72507813471555
Pretty close!
We can also approximate integrals using a technique called quadrature
We can also approximate integrals using a technique called quadrature
With quadrature we effective take weighted sums to approximate integrals
We can also approximate integrals using a technique called quadrature
With quadrature we effective take weighted sums to approximate integrals
We will focus on two classes of quadrature for now:
Suppose we want to integrate a one dimensional function f(x) over [a,b]
How would you do it?
Suppose we want to integrate a one dimensional function f(x) over [a,b]
How would you do it?
One answer is to replace the function with
something easy to integrate: a piecewise polynomial
Suppose we want to integrate a one dimensional function f(x) over [a,b]
How would you do it?
One answer is to replace the function with
something easy to integrate: a piecewise polynomial
Key things to define up front:
x_is are the quadrature nodes of the approximation scheme and divide the interval into n-1 equally spaced subintervals of length h
Most basic Newton-Cotes method:
Most basic Newton-Cotes method:
\int_{x_i}^{x_{i+1}} f(x) dx \approx hf(\frac{1}{2}(x_{i+1}+x_i))
Most basic Newton-Cotes method:
\int_{x_i}^{x_{i+1}} f(x) dx \approx hf(\frac{1}{2}(x_{i+1}+x_i))
Approximates f by a step function
Increase complexity by 1 degree:
Increase complexity by 1 degree:
\int_{x_i}^{x_{i+1}} f(x) dx \approx \frac{h}{2}[f(x_i) + f(x_{i+1})]
Increase complexity by 1 degree:
\int_{x_i}^{x_{i+1}} f(x) dx \approx \frac{h}{2}[f(x_i) + f(x_{i+1})]
We can aggregate this up to: \int_{a}^{b} f(x) dx \approx \sum_{i=1}^n w_i f(x_i)
where w_1=w_n = h/2 and w_i = h otherwise
Trapezoid rule is O(h^2) / first-order exact: it can integrate any linear function exactly
Increase complexity by 1 degree:
Increase complexity by 1 degree:
\int_{x_i}^{x_{i+1}} f(x) dx \approx \frac{h}{3}[f(x_{2i-1}) + 4f(x_{2i}) + f(x_{2i+1})]
Increase complexity by 1 degree:
\int_{x_i}^{x_{i+1}} f(x) dx \approx \frac{h}{3}[f(x_{2i-1}) + 4f(x_{2i}) + f(x_{2i+1})]
We can aggregate this up to: \int_{a}^{b} f(x) dx \approx \sum_{i=1}^n w_i f(x_i)
where w_1=w_n = h/3, otherwise and w_i = 4h/3 if i is even and w_i = 2h/3 if i is odd
Simpson's rule is O(h^4) / third-order exact: it can integrate any cubic function exactly
Simpson's rule is O(h^4) / third-order exact: it can integrate any cubic function exactly
That's weird! Why do we gain 2 orders of accuracy when increasing one order of approximation complexity?
Simpson's rule is O(h^4) / third-order exact: it can integrate any cubic function exactly
That's weird! Why do we gain 2 orders of accuracy when increasing one order of approximation complexity?
Simpson's rule is O(h^4) / third-order exact: it can integrate any cubic function exactly
That's weird! Why do we gain 2 orders of accuracy when increasing one order of approximation complexity?
Simpson's rule is O(h^4) / third-order exact: it can integrate any cubic function exactly
That's weird! Why do we gain 2 orders of accuracy when increasing one order of approximation complexity?
How did we pick the x_i quadrature nodes for Newton-Cotes rules?
How did we pick the x_i quadrature nodes for Newton-Cotes rules?
Evenly spaced, but no particular reason for doing so...
How did we pick the x_i quadrature nodes for Newton-Cotes rules?
Evenly spaced, but no particular reason for doing so...
Gaussian quadrature selects these nodes more efficiently
and relies on weight functions w(x)
Gaussian rules try to exactly integrate some finite dimensional collection of functions (i.e. polynomials up to some degree)
Gaussian rules try to exactly integrate some finite dimensional collection of functions (i.e. polynomials up to some degree)
For a given order of approximation n, the weights w_1,...,w_n and nodes x_1,...,x_n are chosen to satisfy 2n moment matching conditions:
Gaussian rules try to exactly integrate some finite dimensional collection of functions (i.e. polynomials up to some degree)
For a given order of approximation n, the weights w_1,...,w_n and nodes x_1,...,x_n are chosen to satisfy 2n moment matching conditions:
\int_I x^kw(x)dx = \sum_{i=1}^n w_i x^k_i, for k=0,...,2n-1
where I is the interval over which we are integrating
and w(x) is a given weight function
The moment matching conditions pin down w_is and x_is so we can approximate an integral by a weighted sum of the function at the prescribed nodes
The moment matching conditions pin down w_is and x_is so we can approximate an integral by a weighted sum of the function at the prescribed nodes
\int_i f(x) w(x)dx \approx \sum_{i=1}^n w_i f(x_i)
The moment matching conditions pin down w_is and x_is so we can approximate an integral by a weighted sum of the function at the prescribed nodes
\int_i f(x) w(x)dx \approx \sum_{i=1}^n w_i f(x_i)
Gaussian rules are 2n-1 order exact,
we can exactly compute the integral of any polynomial order 2n-1
Gaussian quadrature effectively discretizes some distribution p(x) into mass points (nodes) and probabilities (weights) for some other discrete distribution \bar{p}(x)
Gaussian quadrature effectively discretizes some distribution p(x) into mass points (nodes) and probabilities (weights) for some other discrete distribution \bar{p}(x)
Given an approximation with n mass points, X and \bar{X} have identical moments up to order 2n, and as n\rightarrow\infty we have a continuum of mass points and recover the continuous pdf
Gaussian quadrature effectively discretizes some distribution p(x) into mass points (nodes) and probabilities (weights) for some other discrete distribution \bar{p}(x)
Given an approximation with n mass points, X and \bar{X} have identical moments up to order 2n, and as n\rightarrow\infty we have a continuum of mass points and recover the continuous pdf
But what do we pick for the weighting function w(x)?
We can start out with a simple w(x) = 1, this gives us Gauss-Legendre quadrature
This can approximate the integral of any function arbitrarily well by increasing n
Sometimes we want to compute exponentially discounted sums like: \int_I f(x) e^{-x} dx
The weighting function e^{-x} is Gauss-Laguerre quadrature
Sometimes we want to take expectations of normally distributed variables: \int_I f(x) e^{-x^2} dx
There exist packages or look-up tables to get the prescribed weights and nodes for each of these schemes
Lots of computational problems break down into linear systems
Many non-linear models are linearized
How do we actually solve these systems inside the machine?
If A in Ax=b is upper or lower triangular,
we can solve for x recursively via forward/backward substitution
If A in Ax=b is upper or lower triangular,
we can solve for x recursively via forward/backward substitution
Consider a lower triangular matrix
If A in Ax=b is upper or lower triangular,
we can solve for x recursively via forward/backward substitution
Consider a lower triangular matrix
The first element is the only non-zero value in the first row so x_1 is easy to solve for
If A in Ax=b is upper or lower triangular,
we can solve for x recursively via forward/backward substitution
Consider a lower triangular matrix
The first element is the only non-zero value in the first row so x_1 is easy to solve for
The equation in row 2 contains x_2 and the already solved for x_1
so we can easily solve for x_2 and then continue until we solve for all xs
Forward substitution gives us solutions
x_i = \frac{1}{a_{ii}} \left( b_i - \sum_{j=1}^{i-1}a_{ij}x_j\right), for all i
Forward substitution gives us solutions
x_i = \frac{1}{a_{ii}} \left( b_i - \sum_{j=1}^{i-1}a_{ij}x_j\right), for all i
Forward substitution gives us solutions
x_i = \frac{1}{a_{ii}} \left( b_i - \sum_{j=1}^{i-1}a_{ij}x_j\right), for all i
L-U factorization is an algorithm that decomposes A
into the product of lower and upper triangular matrices
Factor A into lower L and upper U triangular matrices using Gaussian elimination
Solve for x
Why not just use another method like Cramer's rule?
Why not just use another method like Cramer's rule?
Speed
Why not just use another method like Cramer's rule?
Speed
LU is less than O(n^3)
Why not just use another method like Cramer's rule?
Speed
LU is less than O(n^3)
Cramer's rule is O(n!\times n)
Why not just use another method like Cramer's rule?
Speed
LU is less than O(n^3)
Cramer's rule is O(n!\times n)
For a 10x10 system this can really matter
Julia description of the division operator \
:
If A is upper or lower triangular (or diagonal), no factorization of A is required and the system is solved with either forward or backward substitution. For non-triangular square matrices, an LU factorization is used.
So we can do LU factorization approaches to solutions by just doing x = A\b
,
but we can write it ourselves as well
Cramer's Rule can be written as a simple loop:
function solve_cramer(A, b) dets = Vector(undef, length(b)) for index in eachindex(b) B = copy(A) B[:, index] = b dets[index] = det(B) end return dets ./ det(A)end
n = 100A = rand(n, n)b = rand(n)
n = 100A = rand(n, n)b = rand(n)
Let's see the full results of the competition for a 10x10:
using BenchmarkToolscramer_time = @elapsed solve_cramer(A, b);cramer_allocation = @allocated solve_cramer(A, b);lu_time = @elapsed A\b;lu_allocation = @allocated A\b;println("Cramer's rule solved in $cramer_time seconds and used $cramer_allocation kilobytes of memory.")
## Cramer's rule solved in 0.2546825 seconds and used 16196512 kilobytes of memory.
println("LU solved in $(lu_time) seconds and used $(lu_allocation) kilobytes of memory.")
## LU solved in 0.949662062 seconds and used 81904 kilobytes of memory.
println("LU's relative speedup is $(cramer_time/lu_time) and relative decrease in memory is $(cramer_allocation/lu_allocation).")
## LU's relative speedup is 0.26818224102122756 and relative decrease in memory is 197.7499511623364.
Gaussian elimination is where we use row operations
Gaussian elimination is where we use row operations
to turn a matrix (IA) into (LU)
Small errors can have big effects, for example: \begin{align*} \begin{bmatrix} -M^{-1} & 1 \\ 1 & 1 \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix} = \begin{bmatrix} 1\\ 2 \end{bmatrix} \end{align*}
where M is big
Small errors can have big effects, for example: \begin{align*} \begin{bmatrix} -M^{-1} & 1 \\ 1 & 1 \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix} = \begin{bmatrix} 1\\ 2 \end{bmatrix} \end{align*}
where M is big
Lets use L-U Factorization to solve it:
\begin{align*} \begin{bmatrix} -M^{-1} & 1 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0\\ 0 & 1 \end{bmatrix} \begin{bmatrix} -M^{-1} & 1 \\ 1 & 1 \end{bmatrix} \end{align*}
Subtract -M times the first row from the second to get the L-U factorization \begin{align*} \begin{bmatrix} 1& 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} -M^{-1} & 1 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0\\ -M & 1 \end{bmatrix} \begin{bmatrix} -M^{-1} & 1\\ 0 & M+1 \end{bmatrix} \end{align*}
Subtract -M times the first row from the second to get the L-U factorization \begin{align*} \begin{bmatrix} 1& 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} -M^{-1} & 1 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0\\ -M & 1 \end{bmatrix} \begin{bmatrix} -M^{-1} & 1\\ 0 & M+1 \end{bmatrix} \end{align*}
We can get closed-form solutions by applying forward substitution: \begin{align*} \begin{bmatrix} x_1\\ x_2 \end{bmatrix} = \begin{bmatrix} M/(M+1)\\ (M+2)/(M+1) \end{bmatrix} \end{align*}
Both variables are approximately 1 for large M,
but remember adding small numbers to big numbers causes problems numerically
Both variables are approximately 1 for large M,
but remember adding small numbers to big numbers causes problems numerically
If M=10000000000000000000, the computer will return x_2
is equal to precisely 1, this isn't terribly wrong
Both variables are approximately 1 for large M,
but remember adding small numbers to big numbers causes problems numerically
If M=10000000000000000000, the computer will return x_2
is equal to precisely 1, this isn't terribly wrong
When we then perform the second step of backwards substitution, we solve for x_1=-M(1-x_2)=0, this is very wrong
Large errors like this often occur because diagonal elements are very small
function solve_lu(M) b = [1, 2] U = [-M^-1 1; 0 M+1] L = [1. 0; -M 1.] y = L\b # Round element-wise to 3 digits x = round.(U\y, digits = 5)end;true_solution(M) = round.([M/(M+1), (M+2)/(M+1)], digits = 5);
## True solution for M=10 is approximately [0.90909, 1.09091], computed solution is [0.90909, 1.09091]
## True solution for M=1e10 is approximately [1.0, 1.0], computed solution is [1.0, 1.0]
## True solution for M=1e15 is approximately [1.0, 1.0], computed solution is [1.11022, 1.0]
## True solution for M=1e20 is approximately [1.0, 1.0], computed solution is [-0.0, 1.0]
## Julia's division operator is actually pretty smart though, true solution for M=1e20 is A\b = [1.0, 1.0]
A matrix A is said to be ill-conditioned if
a small perturbation in b yields a large change in x
A matrix A is said to be ill-conditioned if
a small perturbation in b yields a large change in x
One way to measure ill-conditioning in a matrix is the elasticity of the solution with respect to b,
\begin{gather*}
\sup_{||\delta b || > 0} \frac{||\delta x|| / ||x||}{||\delta b|| / ||b||}
\end{gather*}
which yields the percent change in x given
a percentage point change in the magnitude of b
If this elasticity is large, then computer representations of the system of equations can lead to large errors due to rounding
If this elasticity is large, then computer representations of the system of equations can lead to large errors due to rounding
Approximate the elasticity by computing the condition number \begin{gather*} \kappa = ||A|| \cdot ||A^{-1}|| \end{gather*}
If this elasticity is large, then computer representations of the system of equations can lead to large errors due to rounding
Approximate the elasticity by computing the condition number \begin{gather*} \kappa = ||A|| \cdot ||A^{-1}|| \end{gather*}
\kappa gives the least upper bound of the elasticity
\kappa is always larger than one and a rule of thumb is that for every order of magnitude, a significant digit is lost in the computation of x
cond([1. 1.; 1. 1.00000001])
## 4.000000065548868e8
Necessary things to download to follow along today and in the future:
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |