Last updated: 2018-03-09
Code version: 599efc1
Here, we will take a brief look at the data provided by Divvy.
I begin by loading a few packages, as well as some additional functions I wrote for this project.
library(data.table)
source("../code/functions.R")I wrote a function, read.divvy.data, that reads in the trip and station data from the Divvy CSV files. This function uses fread from the data.table package to quickly read in the data (it is much faster than read.table). This function also prepares the data, including the departure dates and times, so that they are easier to work with.
divvy <- read.divvy.data()
# Reading station data from ../data/Divvy_Stations_2016_Q4.csv.
# Reading trip data from ../data/Divvy_Trips_2016_Q1.csv.
# Reading trip data from ../data/Divvy_Trips_2016_04.csv.
# Reading trip data from ../data/Divvy_Trips_2016_05.csv.
# Reading trip data from ../data/Divvy_Trips_2016_06.csv.
# Reading trip data from ../data/Divvy_Trips_2016_Q3.csv.
# Reading trip data from ../data/Divvy_Trips_2016_Q4.csv.
# Preparing Divvy data for analysis in R.
# Converting dates and times.We have data on 581 Divvy stations across the city.
nrow(divvy$stations)
# [1] 581
print(head(divvy$stations),row.names = FALSE)
#                        name latitude longitude dpcapacity online_date
#         2112 W Peterson Ave 41.99118 -87.68359         15   5/12/2015
#               63rd St Beach 41.78102 -87.57612         23   4/20/2015
#           900 W Harrison St 41.87468 -87.65002         19    8/6/2013
#  Aberdeen St & Jackson Blvd 41.87773 -87.65479         15   6/21/2013
#     Aberdeen St & Monroe St 41.88042 -87.65560         19   6/26/2013
#    Ada St & Washington Blvd 41.88283 -87.66121         15  10/10/2013We also have information about the >3 million trips taken on Divvy bikes in 2016.
nrow(divvy$trips)
# [1] 3595383
print(head(divvy$trips),row.names = FALSE)
#  trip_id           starttime bikeid tripduration from_station_id
#  9080551 2016-03-31 23:53:00    155          841             344
#  9080550 2016-03-31 23:46:00   4831          649             128
#  9080549 2016-03-31 23:42:00   4232          210             350
#  9080548 2016-03-31 23:37:00   3464         1045             303
#  9080547 2016-03-31 23:33:00   1750          202             334
#  9080546 2016-03-31 23:31:00   4302          638              67
#              from_station_name to_station_id               to_station_name
#  Ravenswood Ave & Lawrence Ave           458      Broadway & Thorndale Ave
#        Damen Ave & Chicago Ave           213        Leavitt St & North Ave
#      Ashland Ave & Chicago Ave           210     Ashland Ave & Division St
#        Broadway & Cornelia Ave           458      Broadway & Thorndale Ave
#    Lake Shore Dr & Belmont Ave           329 Lake Shore Dr & Diversey Pkwy
#  Sheffield Ave & Fullerton Ave           304       Broadway & Waveland Ave
#    usertype gender birthyear start.week start.day start.hour
#  Subscriber   Male      1986         13  Thursday         23
#  Subscriber   Male      1980         13  Thursday         23
#  Subscriber   Male      1979         13  Thursday         23
#  Subscriber   Male      1980         13  Thursday         23
#  Subscriber   Male      1969         13  Thursday         23
#  Subscriber   Male      1991         13  Thursday         23Out of all the Divvy stations in Chicago, the one on Navy Pier (near the corner of Streeter and Grand) had the most activity by far.
departures <- table(divvy$trips$from_station_name)
as.matrix(head(sort(departures,decreasing = TRUE)))
#                               [,1]
# Streeter Dr & Grand Ave      90042
# Lake Shore Dr & Monroe St    51090
# Theater on the Lake          47927
# Clinton St & Washington Blvd 47125
# Lake Shore Dr & North Blvd   45754
# Clinton St & Madison St      41744I would also like to take a close look at the trip data for the main Divvy station on the University of Chicago campus. The Divvy bikes were rented almost 8,000 times in 2016 at that location.
sum(divvy$trips$from_station_name == "University Ave & 57th St",na.rm = TRUE)
# [1] 7944This is the version of R and the packages that were used to generate these results.
sessionInfo()
# R version 3.4.3 (2017-11-30)
# Platform: x86_64-apple-darwin15.6.0 (64-bit)
# Running under: macOS High Sierra 10.13.3
# 
# Matrix products: default
# BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
# 
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] data.table_1.10.4-3
# 
# loaded via a namespace (and not attached):
#  [1] compiler_3.4.3  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
#  [5] tools_3.4.3     htmltools_0.3.6 yaml_2.1.17     Rcpp_0.12.15   
#  [9] stringi_1.1.6   rmarkdown_1.9   knitr_1.20      git2r_0.21.0   
# [13] stringr_1.3.0   digest_0.6.15   evaluate_0.10.1This R Markdown site was created with workflowr 0.10.1.