Last updated: 2020-08-14

Do cars with big engines use more fuel than cars with small engines?

Hypothesis: Cars with bigger engines use more fuel, i.e. the fuel efficiency declines as the engine size gets bigger. If miles per gallon was on the y-axis and engine size on the x-axis we would see a decreasing trend.

# A tibble: 234 x 11
   manufacturer model    displ  year   cyl trans   drv     cty   hwy fl    class
   <chr>        <chr>    <dbl> <int> <int> <chr>   <chr> <int> <int> <chr> <chr>
 1 audi         a4         1.8  1999     4 auto(l~ f        18    29 p     comp~
 2 audi         a4         1.8  1999     4 manual~ f        21    29 p     comp~
 3 audi         a4         2    2008     4 manual~ f        20    31 p     comp~
 4 audi         a4         2    2008     4 auto(a~ f        21    30 p     comp~
 5 audi         a4         2.8  1999     6 auto(l~ f        16    26 p     comp~
 6 audi         a4         2.8  1999     6 manual~ f        18    26 p     comp~
 7 audi         a4         3.1  2008     6 auto(a~ f        18    27 p     comp~
 8 audi         a4 quat~   1.8  1999     4 manual~ 4        18    26 p     comp~
 9 audi         a4 quat~   1.8  1999     4 auto(l~ 4        16    25 p     comp~
10 audi         a4 quat~   2    2008     4 manual~ 4        20    28 p     comp~
# ... with 224 more rows
# create coordinate system
ggplot(data = mpg, aes(x = displ,
                       y = hwy))

ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ,
                           y = hwy))

My hypothesis has been confirmed.

num_rows <- nrow(mtcars)
num_cols <- ncol(mtcars)
ex4_plot <- ggplot(data = mpg,
                   aes(x = hwy,
                       y = cyl)) +


  1. Run ggplot(data = mpg). What do you see?
    Ans: An empty canvas of a plot. If you add the aes(x = xx, y = yy) you will see an empty canvas with the axes drawn.
  2. How many rows are there in mtcars? Columns?
    Ans: Number of rows is 32, cols is 11.
  3. What does the drv variable describe? Ans: ‘The type of drive train, where f = front-wheel drive, r = rear wheel drive, 4 = 4wd’
  4. Make a scatterplot of hwy versus cyl.

  1. What happens if you make a scatterplot of class versus drv? Why is this plot not useful?
ggplot(data = mpg, aes(x = class, 
                       y = drv)) +

These are 2 categorical variables here so this isn’t very useful.

