ScPoEconometricsIntroductionFlorian OswaldSciencesPo Paris 
 2019-09-031 / 57

Welcome to ScPoEconometrics!

In this course you will learn about the basics of Econometrics.

2 / 57

Welcome to ScPoEconometrics!

In this course you will learn about the basics of Econometrics.
You will also learn to use the R software.

2 / 57

Welcome to ScPoEconometrics!

In this course you will learn about the basics of Econometrics.
You will also learn to use the R software.

What is Econometrics, actually?

A set of techniques and methods to answer questions with data.
Econometrics shares many things with Applied Statistics and Machine Learning.
Some Examples!

2 / 57

Answering Questions with Econometrics

Does inauguration of World Bank-financed projects in Sub-Saharan Africa cause an electoral benefit for the incumbent?

3 / 57

Answering Questions with Econometrics

Does inauguration of World Bank-financed projects in Sub-Saharan Africa cause an electoral benefit for the incumbent?

Does a certain concentration of airbnb listings in some urban area cause an increase in long-term rents?

3 / 57

Answering Questions with Econometrics

Does inauguration of World Bank-financed projects in Sub-Saharan Africa cause an electoral benefit for the incumbent?

Does a certain concentration of airbnb listings in some urban area cause an increase in long-term rents?

Will increasing the minimum wage cause greater unemployment?

3 / 57

Causality

Notice the keyword cause in all of the above.
Notice also that many other factors could have caused each of those outcomes.
Econometrics is often about spelling out conditions under which we can claim to measure causal relationships.
We will encounter the most basic of those conditions, and talk about some potential pitfalls.

As in the acclaimed book of why we often ask why did something happen?

4 / 57

This Course

Teach you the basics of Linear Regression

5 / 57

This Course

Teach you the basics of Linear Regression
Introduce you to the R software environment.

5 / 57

This Course

Teach you the basics of Linear Regression
Introduce you to the R software environment.
This is not a course about R.

5 / 57

This Course

Teach you the basics of Linear Regression
Introduce you to the R software environment.
This is not a course about R.

Grades

There will be quizzes on Moodle roughly every two weeks.
There will be two or three take home exams / case studies.
There will be no final exam. Your grade will be 40% of 1 and 60% of 2.

5 / 57

Course Materials

The Book.
The Slides
The code repository for book and R package
Quizzes on Moodle

6 / 57

R7 / 57

What is `R`?¹

To quote the R project website:

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.

8 / 57

What is `R`?¹

To quote the R project website:

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.

What does that mean?

R was created for statistical and graphical work.
R has a vibrant, thriving online community. (stack overflow)
Plus it's free and open source.

[1]: The next 3 slides have been shamelessly copied from Ed Rubin's course

8 / 57

Why are we using `R`?

1. R is free and open source—saving both you and the university 💰💵💰.

2. Related: Outside of a small group of economists, private- and public-sector employers favor R over Stata and most competing softwares.

3. R is very flexible and powerful—adaptable to nearly any task, e.g., 'metrics, spatial data analysis, machine learning, web scraping, data cleaning, website building, teaching.

9 / 57

Why are we using `R`?

1. R is free and open source—saving both you and the university 💰💵💰.

2. Related: Outside of a small group of economists, private- and public-sector employers favor R over Stata and most competing softwares.

3. R is very flexible and powerful—adaptable to nearly any task, e.g., 'metrics, spatial data analysis, machine learning, web scraping, data cleaning, website building, teaching.

4. Related: R imposes no limitations on your amount of observations, variables, memory, or processing power. (I'm looking at you, Stata.)

5. If you put in the work², you will come away with a valuable and marketable tool.

6. I 💖 R

[2]: Learning R definitely requires time and effort.

9 / 57

10 / 57

R SHOWCASE11 / 57

Data Wrangling

The flights dataset contains on-time data for all flights that departed NYC (i.e. JFK, LGA or EWR) in 2013.
Suppose we want to know the relationship between distance and average delay, as in this example.
We need to group the data by destination, summarise to get distance, delay and number of flights

library(nycflights13)
 # select 3 cols of interest
fl = flights[ ,c("dest","distance","arr_delay")] 
# show first 4 lines of this dataframe
head(fl,n = 4)

## # A tibble: 4 x 3
##   dest  distance arr_delay
##   <chr>    <dbl>     <dbl>
## 1 IAH       1400        11
## 2 IAH       1416        20
## 3 MIA       1089        33
## 4 BQN       1576       -18

12 / 57

Data Wrangling

There are always several ways to achieve a goal. (In life 😄)
Here are the two leading data packages:

`dplyr`

delays_dplyr <- fl %>% 
  group_by(dest) %>% 
  summarise(
    count = n(),
    dist = mean(distance, na.rm = TRUE),
    delay = mean(arr_delay, na.rm = TRUE)
  ) %>% 
  filter(count > 20, dest != "HNL")

`data.table`

library(data.table)
dl_dt = data.table(fl)
delays_dt <- 
  dl_dt[, list(count = .N, 
              dist = mean(distance,na.rm=T),
              delay = mean(arr_delay, na.rm=T)),
          by = dest][count > 20 & dest != "HNL"]

13 / 57

Plotting

Now we could look at the result in delays_dt, or compute some statistics from it.
Nothing beats a picture, though:

ggplot(data = delays_dt, 
       mapping = aes(x = dist, y = delay)) +
  geom_point(aes(size = count), alpha = 1/3) +
  geom_smooth(se = FALSE)

14 / 57

Plotting in Base `R` vs `ggplot`

R has very good graphical capabilities outside of contributed packages like ggplot2.
We will encounter both approaches in this course.
You will quickly see that some things are easier in base rather than ggplot and vice versa.

15 / 57

Spatial Data

R is very strong with spatial data. In particular via the sf package.
We can represent any shape or geometry.
Maps are the most obvious example:

library(sf)
plot(paris_sh[,"n"],
     main = "Number or IRIS by Arrondissement",
     key.pos = 3, 
     axes = FALSE, 
     key.width = lcm(1.3), 
     key.length = 1.0)

16 / 57

Spatial Data

R is very strong with spatial data. In particular via the sf package.
We can represent any shape or geometry.
Maps are the most obvious example:

library(sf)
plot(paris_sh[,"n"],
     main = "Number or IRIS by Arrondissement",
     key.pos = 3, 
     axes = FALSE, 
     key.width = lcm(1.3), 
     key.length = 1.0)

16 / 57

Spatial Plotting with ggplot

ggplot can also directly plot spatial data

here is an example:

ggplot(paris_sh) +         # base layer: data
geom_sf(aes(fill = n)) +   # the `sf` geom
scale_fill_viridis_c() +   # greenish fill
theme_bw()                 # simple theme

17 / 57

3D ggplots

A recent very cool 😎 package is rayshader
Sometimes graphs are better to see in 3D.

18 / 57

3D ggplots

A recent very cool 😎 package is rayshader
Sometimes graphs are better to see in 3D.

Like, real 3D...

18 / 57

19 / 57

R 101: Here Is Where You Start.20 / 57

Tool Time!

Getting R and Rstudio

Download R from CRAN for your OS.
Download RStudio from here for your OS.

21 / 57

Start your `Rstudio`!

First Glossary of Terms

R: a statistical programming language
RStudio: an integrated development environment (IDE) to work with R
command: user input (text or numbers) that R understands.
script: a list of commands collected in a text file, each separated by a new line, to be run one after the other.

22 / 57

R as a Calculator

You can use the R console like a calculator
Just type an arithmetic operation after > and hit Enter!

23 / 57

R as a Calculator

You can use the R console like a calculator
Just type an arithmetic operation after > and hit Enter!

Some basic arithmetic first:
```
4 + 1
```
```
## [1] 5
```
```
8 / 2
```
```
## [1] 4
```

Great! What about this?

log(exp(1))

## [1] 1

# by the way: this is a comment! (R disregards it)

23 / 57

Calculator 2

We can also do exponents with ^:
```
x = 2
x^3
```
```
## [1] 8
```
Square roots
```
sqrt(2)
```
```
## [1] 1.414214
```
and many logarithmic and trigonometric functions.
many ... What??

24 / 57

25 / 57

Where to get Help?

R built-in help:

?log
?sin
?paste
?lm
help(lm)   # help() is equivalent
??plot  # get all help on keyword "plot"
help(ggplot,package="ggplot2")  # show help from a certain package

Help from Humans!
- stackoverflow.com [SO]
- Your classroom channel on Slack
- rstudio forum

26 / 57

HOW to get Help?

Describe what you want to do.
Describe what you expect your code to do.
Describe what you your code does instead.
- Provide the entire error message.
Provide enough code to reproduce your error.
- You can post post code snippets on slack and SO

27 / 57

R Packages

R users contribute add-on data and functions as packages
Installing packages is easy!
```
install.packages("ggplot2")
```
To use the contents of a packge, we must load it from our library:
```
library(ggplot2)
```

28 / 57

`ScPoEconometrics` package

We wrote an R package for you.
It's hosted on github.com

You can install (and frequently update!) from there:

if (!require("devtools")) install.packages("devtools")
library(devtools)
install_github(repo = "ScPoEcon/ScPoEconometrics")

29 / 57

`ScPoEconometrics` package

We wrote an R package for you.
It's hosted on github.com

You can install (and frequently update!) from there:

if (!require("devtools")) install.packages("devtools")
library(devtools)
install_github(repo = "ScPoEcon/ScPoEconometrics")

Did it work?

library(ScPoEconometrics)
packageVersion("ScPoEconometrics")

## [1] '0.2.2'

29 / 57

Data Types and Data Structures

Numeric: 1.0, 2.1
Integer: 1L, 2L, 42L
Logical: TRUE and FALSE
Character: "a", "Statistics", "1 plus 2."
Categorical or factor

You should read more right here!

30 / 57

Vectors

What is a vector?
The c function creates vectors.
```
c(1, 3, 5, 7, 8, 9)
```
```
## [1] 1 3 5 7 8 9
```

Coercion to unique types:

c(42, "Statistics", TRUE)

## [1] "42"         "Statistics" "TRUE"

Creating a Range
```
(y = 1:6)
```
```
## [1] 1 2 3 4 5 6
```

31 / 57

Vectors from Sequences and Repetitions

seq creates a sequence from to in steps of by:

seq(from = 1.5, to = 2.1, by = 0.1)

## [1] 1.5 1.6 1.7 1.8 1.9 2.0 2.1

rep repeats items:

rep("A", times = 10)

##  [1] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"

They also work in combination:

c(x, rep(seq(1, 9, 2), 3), c(1, 2, 3), 42, 2:4)

##  [1]  2  1  3  5  7  9  1  3  5  7  9  1  3  5  7  9  1  2  3 42  2  3  4

32 / 57

YOUR TURN

33 / 57

Task 1

Create a vector of five ones, i.e.

## [1] 1 1 1 1 1

Notice that the colon operator a:b is just short for construct a sequence from a to b. Create a vector the counts down from 10 to 0, i.e. it looks like

##  [1] 10  9  8  7  6  5  4  3  2  1  0

Use rep to create a vector that looks like this:

##  [1] 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3

Find out (using help(), google or whatever) how to get the length of a vector in R!

34 / 57

Indexing or Subsetting a Vector

R uses 1-based indexing.
We use [] to get the value at an index
```
x = c(1, 3, 5, 7, 8, 9)
x[2]
```
```
## [1] 3
```
Works with vectors of indices:
```
x[c(2,5)]
```
```
## [1] 3 8
```
And can get all but indices:
```
x[-4]
```
```
## [1] 1 3 5 8 9
```

35 / 57

Logical Subsetting

One can use a vector of TRUE and FALSE to index:

x = c(1, 3, 5, 7, 8, 9)
x[c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE)]

## [1] 1 3 5 8

Let's create a logical vector for condition x > 3

x > 3

## [1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE

We can get all values of x where x>3 is TRUE:
```
x[ x > 3 ]
```
```
## [1] 5 7 8 9
```

36 / 57

Task 2!

From the runif function get 10 numbers drawn from the uniform distribution and store in x.
get all the elements of x larger than 0.3, and store them in y.
using the function which, store the indices of all of those elements in iy.
```
## [1]  1  2  3  4  9 10
```
Check that y and x[iy] are identical.
```
## [1] TRUE TRUE TRUE TRUE TRUE TRUE
```

37 / 57

Matrix

A Matrix is a two-dimensional Array

X = matrix(1:9, nrow = 3, ncol = 3)
X

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

Subsetting needs two indices now:
```
X[1,2]
```
```
## [1] 4
```
```
X[3, ]
```
```
## [1] 3 6 9
```

38 / 57

Matrix Operations

Let's create two matrices.

X = matrix(1:4, 2, 2)
Y = matrix(4:1, 2, 2)

Arithmetic!

X * Y # equally for +, - and /

##      [,1] [,2]
## [1,]    4    6
## [2,]    6    4

But X * Y is not matrix multiplication. All of above are element by element operations.
Matrix multiplication uses %*%. What is X %*% Y for you?

39 / 57

Task 3

Create a vector containing 1,2,3,4,5 called v.
Create the (2,5) matrix m:

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    2    3    4    5
## [2,]    6    7    8    9   10

Perform matrix multiplication of m with v. Use the command %*%. What dimension does the output have?

## [1] 2 1

Why does the command v %*% m not work?

40 / 57

Lists

Up to now, all containers were homogeneous
lists are more flexible:

# works with and without fieldnames
ex_list = list(
  a = c(1, 2, 3, 4),
  b = TRUE,
  c = "Hello!",
  d = diag(2)
)
ex_list

## $a
## [1] 1 2 3 4
## 
## $b
## [1] TRUE
## 
## $c
## [1] "Hello!"
## 
## $d
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1

41 / 57

Indexing Lists

[] gets a sublist
[[]] gets a list element
Can index by numerical index or name

ex_list[1]

## $a
## [1] 1 2 3 4

ex_list[[1]]

## [1] 1 2 3 4

ex_list$d

##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1

42 / 57

Task 4

Copy and paste the above code for ex_list into your R session. Remember that list can hold any kind of R object. Like...another list! So, create a new list new_list that has two fields: a first field called "this" with string content "is awesome", and a second field called "ex_list" that contains ex_list.
Accessing members is like in a plain list, just with several layers now. Get the element c from ex_list in new_list!
Compose a new string out of the first element in new_list, the element under label this. Use the function paste to print the string R is awesome to your screen.

43 / 57

`data.frame`'s

data.frame's are like spreadsheets.

example_data = data.frame(x = c(1, 3, 5, 7),
                          y = c(rep("Hello", 3), "Goodbye"),
                          z = sample(c(TRUE,FALSE),size=4,replace=TRUE))
example_data

##   x       y     z
## 1 1   Hello  TRUE
## 2 3   Hello FALSE
## 3 5   Hello  TRUE
## 4 7 Goodbye  TRUE

44 / 57

`data.frame`s

Useful methods for a dataframe:

nrow(example_data)

## [1] 4

ncol(example_data)

## [1] 3

names(example_data)

## [1] "x" "y" "z"

45 / 57

Data on Cars

The mtcars dataset is built-in to R.

head(mtcars,n=3)  # show first 3 rows

##                mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1

To access one of the variables as a vector, we use the $ operator as in mtcars$mpg
Or we use the column name or index: mtcars[,"mpg"] or mtcars[,1]

46 / 57

Subsetting `data.frames`

Subsetting is like with matrices, [,]

# mpg[row condition, col index]
mtcars[mtcars$mpg > 32, c("cyl", "disp", "wt")]

##                cyl disp    wt
## Fiat 128         4 78.7 2.200
## Toyota Corolla   4 71.1 1.835

But there is a special function which looks nicer.

subset(mtcars, subset = mpg > 32, select = c("cyl", "disp", "wt")]

47 / 57

Task 4

How many observations are there in mtcars?
How many variables?
What is the average value of mpg?
What is the average value of mpg for cars with more than 4 cylinders, i.e. with cyl>4?

48 / 57

Basic Programming

It's useful for us to review some basics for programming.
We won't be going very deep here, but it's good for you to know some of this.

49 / 57

Variables

A variable refers to an object.
Another way to say it is that a variable is a name or a label for something:
```
x = 1
y = "roses"
z = function(x){sqrt(x)}
```
local variables are only defined (and hence visible) within a certain area of your code, called a scope
global variables are defined everywhere.
Try to avoid global variables.

50 / 57

Control FlowWe can influence which branch our code executes based on
Whether we follow one branch or another depends on a condition.

if (condition == TRUE) {
  some R code
} else {
  some other R code
}

x = 1
y = 3
if (x > y) {  # test if x > y
  # if TRUE
  z = x * y  # assign value to z
  print("x is larger than y")
} else {
  # if FALSE
  z = x + 5 * y  # assign other value to z
  print("x is less than or equal to y")
}

## [1] "x is less than or equal to y"
z

## [1] 16
51 / 57

Loops

This is an example for a loop:

for (i in 1:3){  # does not have to be 1:3!
  print(i) # loop body: gets executed each time
  # the value of i changes with each iteration
}

## [1] 1
## [1] 2
## [1] 3

52 / 57

Loops

This is an example for a loop:

for (i in 1:3){  # does not have to be 1:3!
  print(i) # loop body: gets executed each time
  # the value of i changes with each iteration
}

## [1] 1
## [1] 2
## [1] 3

We can iterate over things other than numbers:

for (i in c("mangos","bananas","apples")){
  print(paste("I love",i))  # the paste function pastes together     strings
}

## [1] "I love mangos"
## [1] "I love bananas"
## [1] "I love apples"

52 / 57

Nested Loops

We often also see nested loops, which are just what its name suggests:

for (i in 2:3){
  # first nest: for each i
  for (j in c("mangos","bananas","apples")){
    # second nest: for each j
    print(paste("Can I get",i,j,"please?"))
  }
}

## [1] "Can I get 2 mangos please?"
## [1] "Can I get 2 bananas please?"
## [1] "Can I get 2 apples please?"
## [1] "Can I get 3 mangos please?"
## [1] "Can I get 3 bananas please?"
## [1] "Can I get 3 apples please?"

53 / 57

Functions

Function say_hello tells R what to do when it you tell it say_hello().

say_hello <- function(your_name = "Lord Vader"){
  paste("You R most welcome,",your_name)
}
# we call the function by typing it's name with round brackets
say_hello()

## [1] "You R most welcome, Lord Vader"

We specified a default argument, but we don't have to.
Call the function with your name!

54 / 57

55 / 57

Task 5

Write a for loop that counts down from 10 to 1, printing the value of the iterator to the screen.
Modify that loop to write "i iterations to go" where i is the iterator
Modify that loop so that each iteration takes roughly one second. You can achieve that by adding the command Sys.sleep(1) below the line that prints "i iterations to go".
Finally, let's create a function called ticking_bomb. it takes no arguments, it's body is the loop you wrote in the preceding question. The only think you should add to the body is a line after the loop finishes, printing "BOOOOM!" with print("BOOOOM!"). You can repeatedly redefine the function in the console, and try it out with ticking_bomb().

56 / 57

END


	florian.oswald@sciencespo.fr
	Slides
	Book
	@ScPoEcon
	@ScPoEcon

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

ScPoEconometrics

Introduction

Florian Oswald

SciencesPo Paris 2019-09-03

Welcome to ScPoEconometrics!

Welcome to ScPoEconometrics!

Welcome to ScPoEconometrics!

What is Econometrics, actually?

Answering Questions with Econometrics

Answering Questions with Econometrics

Answering Questions with Econometrics

Causality

This Course

This Course

This Course

This Course

Grades

Course Materials

R

What is R?1

What is R?1

Why are we using R?

Why are we using R?

R SHOWCASE

Data Wrangling

Data Wrangling

dplyr

data.table

Plotting

Plotting in Base R vs ggplot

Spatial Data

Spatial Data

Spatial Plotting with ggplot

3D ggplots

3D ggplots

R 101: Here Is Where You Start.

Tool Time!

Getting R and Rstudio

Start your Rstudio!

First Glossary of Terms

R as a Calculator

R as a Calculator

Calculator 2

Where to get Help?

HOW to get Help?

R Packages

ScPoEconometrics package

ScPoEconometrics package

Data Types and Data Structures

Vectors

Vectors from Sequences and Repetitions

YOUR TURN

Task 1

Indexing or Subsetting a Vector

Logical Subsetting

Task 2!

Matrix

Matrix Operations

Task 3

Lists

Indexing Lists

Task 4

data.frame's

data.frames

Data on Cars

Subsetting data.frames

Task 4

Basic Programming

Variables

Control Flow

Loops

Loops

Nested Loops

Functions

Task 5

END

Welcome to ScPoEconometrics!

Help

SciencesPo Paris
2019-09-03

`R`

What is `R`?¹

What is `R`?¹

Why are we using `R`?

Why are we using `R`?

`R` SHOWCASE

`dplyr`

`data.table`

Plotting in Base `R` vs `ggplot`

`R` 101: Here Is Where You Start.

Start your `Rstudio`!

`ScPoEconometrics` package

`ScPoEconometrics` package

`data.frame`'s

`data.frame`s

Subsetting `data.frames`