Last updated: 2017-07-05

Code version: 5e53297

I begin by loading a few packages, as well as some additional functions I wrote for this project, into the R environment.

library(data.table)
source("../code/functions.R")


Reading the data

I wrote a function, read.divvy.data, that reads in the trip and station data from the CSV files downloaded from the Divvy website. This function uses fread from the data.table package to quickly read in the data (it is much faster than read.table). This function also prepares the data, notably the dates and times, so that they are easier to work with.

divvy <- read.divvy.data()
# Reading station data from ../data/Divvy_Stations_2016_Q4.csv.
# Reading trip data from ../data/Divvy_Trips_2016_Q1.csv.
# Reading trip data from ../data/Divvy_Trips_2016_04.csv.
# Reading trip data from ../data/Divvy_Trips_2016_05.csv.
# Reading trip data from ../data/Divvy_Trips_2016_06.csv.
# Reading trip data from ../data/Divvy_Trips_2016_Q3.csv.
# Reading trip data from ../data/Divvy_Trips_2016_Q4.csv.
# Preparing Divvy data for analysis in R.
# Converting dates and times.


A first glance at the Divvy data

We have information on 581 Divvy stations across the city of Chicago.

nrow(divvy$stations)
# [1] 581
head(divvy$stations)
#                           name latitude longitude dpcapacity online_date
# 456        2112 W Peterson Ave 41.99118 -87.68359         15   5/12/2015
# 101              63rd St Beach 41.78102 -87.57612         23   4/20/2015
# 109          900 W Harrison St 41.87468 -87.65002         19    8/6/2013
# 21  Aberdeen St & Jackson Blvd 41.87773 -87.65479         15   6/21/2013
# 80     Aberdeen St & Monroe St 41.88042 -87.65560         19   6/26/2013
# 346   Ada St & Washington Blvd 41.88283 -87.66121         15  10/10/2013

In 2016, people took over 3 million trips on Divvy bikes.

nrow(divvy$trips)
# [1] 3595383
head(divvy$trips)
#   trip_id           starttime bikeid tripduration from_station_id
# 1 9080551 2016-03-31 23:53:00    155          841             344
# 2 9080550 2016-03-31 23:46:00   4831          649             128
# 3 9080549 2016-03-31 23:42:00   4232          210             350
# 4 9080548 2016-03-31 23:37:00   3464         1045             303
# 5 9080547 2016-03-31 23:33:00   1750          202             334
# 6 9080546 2016-03-31 23:31:00   4302          638              67
#               from_station_name to_station_id
# 1 Ravenswood Ave & Lawrence Ave           458
# 2       Damen Ave & Chicago Ave           213
# 3     Ashland Ave & Chicago Ave           210
# 4       Broadway & Cornelia Ave           458
# 5   Lake Shore Dr & Belmont Ave           329
# 6 Sheffield Ave & Fullerton Ave           304
#                 to_station_name   usertype gender birthyear start.week
# 1      Broadway & Thorndale Ave Subscriber   Male      1986         13
# 2        Leavitt St & North Ave Subscriber   Male      1980         13
# 3     Ashland Ave & Division St Subscriber   Male      1979         13
# 4      Broadway & Thorndale Ave Subscriber   Male      1980         13
# 5 Lake Shore Dr & Diversey Pkwy Subscriber   Male      1969         13
# 6       Broadway & Waveland Ave Subscriber   Male      1991         13
#   start.day start.hour
# 1  Thursday         23
# 2  Thursday         23
# 3  Thursday         23
# 4  Thursday         23
# 5  Thursday         23
# 6  Thursday         23

Out of all the Divvy stations in Chicago, the one on Navy Pier (at Streeter and Grand) had the most activity.

counts <- table(divvy$trips$from_station_name)
as.matrix(head(sort(counts,decreasing=TRUE)))
#                               [,1]
# Streeter Dr & Grand Ave      90042
# Lake Shore Dr & Monroe St    51090
# Theater on the Lake          47927
# Clinton St & Washington Blvd 47125
# Lake Shore Dr & North Blvd   45754
# Clinton St & Madison St      41744

I will also take a close look at trip data for the main Divvy station on the University of Chicago campus, since that is where I work.

sum(divvy$trips$from_station_name == "University Ave & 57th St")
# [1] NA

Session information

This is the version of R and the packages that were used to generate these results.

sessionInfo()
# R version 3.3.2 (2016-10-31)
# Platform: x86_64-apple-darwin13.4.0 (64-bit)
# Running under: macOS Sierra 10.12.5
# 
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] data.table_1.10.4
# 
# loaded via a namespace (and not attached):
#  [1] backports_1.0.5 magrittr_1.5    rprojroot_1.2   tools_3.3.2    
#  [5] htmltools_0.3.6 yaml_2.1.14     Rcpp_0.12.11    stringi_1.1.2  
#  [9] rmarkdown_1.6   knitr_1.16      git2r_0.18.0    stringr_1.2.0  
# [13] digest_0.6.12   evaluate_0.10.1

This R Markdown site was created with workflowr