For this project, we are using R Studio and our goal was to ask and answer three questions about the available bikeshare data from Washington, Chicago, and New York.

ny = read.csv('new_york_city.csv')
wash = read.csv('washington.csv')
chi = read.csv('chicago.csv')

head(ny)

head(wash)

head(chi)

Question 1¶

Popular times of travel - What is the most common month for traveling users

library(ggplot2)

First, we will need to extract the month from the Start.Time field in each of our 3 datasets.

New York¶

ny$month <- format(as.Date(ny$Start.Time, format="%Y-%m-%d"),"%m") ##creating new column in dataframe and extracting month from Start.Time

head(ny) ##confirming new column created and extracted month

qplot(x = month, data = ny, color = I('black'), fill = I('#F79420'), xlab = 'Month', ylab = 'Number of Rides') #plotting Month

table(ny$month) ##using table to get summary info

   01    02    03    04    05    06 
 5745  6364  5820 10661 12180 14000

Washington¶

wash$month <- format(as.Date(wash$Start.Time, format="%Y-%m-%d"),"%m") #extracting month for wash dataframe

head(wash) #verifying extraction of month and new column

qplot(x = month, data = wash, color = I('black'), fill = I('#0B6623'), xlab = 'Month', ylab = 'Number of Rides') #plotting wash dataframe for month of start.time

qplot(x = month, data = subset(wash, !is.na(month)), color = I('black'), fill = I('#0B6623'), xlab = 'Month', ylab = 'Number of Rides') ##Removing NA column from above

table(wash$month)

   01    02    03    04    05    06 
 8946 11563 12612 18522 17072 20335

Chicago¶

chi$month <- format(as.Date(chi$Start.Time, format="%Y-%m-%d"),"%m") #extracting month for chi dataframe

head(chi) #verifying new column & month extracted for chi dataframe

qplot(x = month, data = chi, color = I('black'), fill = I('#D30000'), xlab = 'Month', ylab = 'Number of Rides') #plotting month of Start.Time for chi dataframe

table(chi$month)

  01   02   03   04   05   06 
 650  930  803 1526 1905 2816

Summary of your question 1 results goes here.

In our first question, we wanted to see which Month was the most popular month for bikeshare rides across the cities of New York, Washington, and Chicago.

In New York, the most popular month was the 6th month of the year, which is June. June had a total of 14,000 rides which was 1,820 more rides than the next closest month which was May with 12,180 rides.

In Washington, the most popular month was also June. Washington had a total of 20,335 rides in the month of June which was 1,813 more rides than the next closest month. The 2nd highest month for rides in Washington was April with 18,522 rides.

In Chicago, the most popular months for rides was also the month of June. The month of June in Chicago had a total of 2816 rides. The next closest month was May with 1905 rides. One thing to note is Chicago's dataset does show a much lower volume of bikeshare rides overall.

Overall, the datasets for all 3 cities(New York, Washington, & Chicago) indicate that the most popular month for bikeshare rides is June. This would suggest that the most rides would occur during the summer month of June possibly due to more favorable weather coniditions in these 3 particular cities.

Question 2¶

User Info - What is the most common User Type across the 3 cities?

New York¶

table(ny$User.Type) ##Getting some info on the User.Type column, looks like we have some blanks

             Customer Subscriber 
       119       5558      49093

ny["User.Type"][ny["User.Type"] == ''] <- NA ##converting the blanks in the User.Type column to NA

table(ny$User.Type) #checking to see if conversion worked

             Customer Subscriber 
         0       5558      49093

ggplot(data=subset(ny, !is.na(User.Type)), aes(x=User.Type)) + 
geom_bar(color = 'black', fill = '#F79420')

Washington¶

table(wash$User.Type) ##Getting some info on the User.Type column

             Customer Subscriber 
         1      23450      65600

wash["User.Type"][wash["User.Type"] == ''] <- NA ##converting any blanks in the User.Type column to NA

table(wash$User.Type) #checking to see if conversion worked

             Customer Subscriber 
         0      23450      65600

ggplot(data=subset(wash, !is.na(User.Type)), aes(x=User.Type)) + 
geom_bar(color = 'black', fill = '#0B6623')

Chicago¶

table(chi$User.Type) ##Getting some info on the User.Type column

             Customer Subscriber 
         1       1746       6883

chi["User.Type"][chi["User.Type"] == ''] <- NA ##converting the blanks in the User.Type column to NA

table(chi$User.Type) #checking to see if conversion worked

             Customer Subscriber 
         0       1746       6883

ggplot(data=subset(chi, !is.na(User.Type)), aes(x=User.Type)) + 
geom_bar(color = 'black', fill = '#D30000')

Summary of your question 2 results goes here.

In our second question, we wanted to see which User Type was the most common for bikeshare rides across the cities of New York, Washington, and Chicago.

In New York, the most common user type was the Subscriber. The subscriber user type had a total of 49,093 while the customer user type had a total of 5,558. This would equate to a difference of 43,535 between user types in New York. About 90% of bikeshare riders in New York are subscribers.

In Washington, the most common user type was also the Subscriber. Here, the subscriber user type had a total of 65,600 while the customer user type had a total of 23,450. This would equate to a difference of 42,150 between user types in Washington. While Washington does have a higher overall customer user type the gap difference remains similar to that of New York. About 74% of bikeshare riders in Washington are subscribers.

In Chicago, the most common user type was also the Subscriber. Here, the subscriber user type had a total of 6,883 while the customer user type had a total of 1,746. This would equate to a difference of 5,137 between user types in Chicago. Again, we note Chicago's dataset does show a much lower volume of bikeshare rides overall in the city. Nonetheless, about 80% of Chicago's bikeshare rides are from subscribers.

Overall, the datasets for all 3 cities(New York, Washington, & Chicago) indicate that an overwhelming majority of bikeshare rides are utilized by the 'Subscriber' user type. This is reinforced by the data which shows each city having at least 74% of their total bikeshare rides come from subscribers over any other user type.

Question 3¶

User Info - What gender type is the most common for the bikeshare data across the cities of New York & Chicago?

Note: Gender data is not available for the city of Washington

names(ny)

names(wash) ##showing NO 'Gender' column in wash dataset

names(chi)

New York¶

by(ny$User.Type, ny$Gender, summary)

ny$Gender: 
             Customer Subscriber       NA's 
         0       4743        664          3 
------------------------------------------------------------ 
ny$Gender: Female
             Customer Subscriber       NA's 
         0        324      11804         31 
------------------------------------------------------------ 
ny$Gender: Male
             Customer Subscriber       NA's 
         0        491      36625         85

ny["Gender"][ny["Gender"] == ''] <- NA ##converting the blanks in the Gender column to NA

by(ny$User.Type, ny$Gender, summary)

ny$Gender: 
NULL
------------------------------------------------------------ 
ny$Gender: Female
             Customer Subscriber       NA's 
         0        324      11804         31 
------------------------------------------------------------ 
ny$Gender: Male
             Customer Subscriber       NA's 
         0        491      36625         85

ny <- na.omit(ny) ##Omit the NAs

by(ny$User.Type, ny$Gender, summary)

ny$Gender: 
NULL
------------------------------------------------------------ 
ny$Gender: Female
             Customer Subscriber 
         0        324      11803 
------------------------------------------------------------ 
ny$Gender: Male
             Customer Subscriber 
         0        491      36625

qplot(x = User.Type, data = ny, color = I('black'), fill = I('#F79420')) +
    facet_grid(Gender~.)

Chicago¶

by(chi$User.Type, chi$Gender, summary)

chi$Gender: 
             Customer Subscriber       NA's 
         0       1746          1          1 
------------------------------------------------------------ 
chi$Gender: Female
             Customer Subscriber 
         0          0       1723 
------------------------------------------------------------ 
chi$Gender: Male
             Customer Subscriber 
         0          0       5159

chi["Gender"][chi["Gender"] == ''] <- NA ##converting the blanks in the Gender column to NA

by(chi$User.Type, chi$Gender, summary)

chi$Gender: 
NULL
------------------------------------------------------------ 
chi$Gender: Female
             Customer Subscriber 
         0          0       1723 
------------------------------------------------------------ 
chi$Gender: Male
             Customer Subscriber 
         0          0       5159

chi <- na.omit(chi) ##Omit the NAs

by(chi$User.Type, chi$Gender, summary)

chi$Gender: 
NULL
------------------------------------------------------------ 
chi$Gender: Female
             Customer Subscriber 
         0          0       1723 
------------------------------------------------------------ 
chi$Gender: Male
             Customer Subscriber 
         0          0       5159

qplot(x = User.Type, data = chi, color = I('black'), fill = I('#D30000')) +
    facet_grid(Gender~.)

Summary of your question 3 results goes here.

In our third and final question, we wanted to see what the makeup was of Gender for each of the User Types for bikeshare rides across the cities of New York and Chicago. The city of Washington did not contain a gender column for us to analyze.

In New York, the Subscriber user type was made up of mostly Males. The subscribers that were males outnumbered the females by about 3 to 1. On the other hand, the customer user type was more evenly balanced with females and males being around the same number.

In Chicago, the Subscriber user type was also made up of mostly Males. The subscribers that were males outnumbered the females by about 5 to 1 in this dataset. Unfortunately, we were unable to analyze what gender type for the user type customer as it seems all of the customer user types in the Chicago dataset had a NA/NULL/Blank for the gender column. The assumption here is that they don't collect gender information on customers but do on subscribers.

Overall, the datasets for New York and Chicago indicate that an overwhelming majority of bikeshare riders that are 'Subscribers' are of the gender 'Male'. Based of the data we have the customer user type seemed more balanced when it came to gender, but again the sample size was much smaller for that user type.

Finishing Up¶

Congratulations! You have reached the end of the Explore Bikeshare Data Project.

system('python -m nbconvert Explore_bikeshare_data.ipynb')

X	Start.Time	End.Time	Trip.Duration	Start.Station	End.Station	User.Type	Gender	Birth.Year
5688089	2017-06-11 14:55:05	2017-06-11 15:08:21	795	Suffolk St & Stanton St	W Broadway & Spring St	Subscriber	Male	1998
4096714	2017-05-11 15:30:11	2017-05-11 15:41:43	692	Lexington Ave & E 63 St	1 Ave & E 78 St	Subscriber	Male	1981
2173887	2017-03-29 13:26:26	2017-03-29 13:48:31	1325	1 Pl & Clinton St	Henry St & Degraw St	Subscriber	Male	1987
3945638	2017-05-08 19:47:18	2017-05-08 19:59:01	703	Barrow St & Hudson St	W 20 St & 8 Ave	Subscriber	Female	1986
6208972	2017-06-21 07:49:16	2017-06-21 07:54:46	329	1 Ave & E 44 St	E 53 St & 3 Ave	Subscriber	Male	1992
1285652	2017-02-22 18:55:24	2017-02-22 19:12:03	998	State St & Smith St	Bond St & Fulton St	Subscriber	Male	1986

X	Start.Time	End.Time	Trip.Duration	Start.Station	End.Station	User.Type
1621326	2017-06-21 08:36:34	2017-06-21 08:44:43	489.066	14th & Belmont St NW	15th & K St NW	Subscriber
482740	2017-03-11 10:40:00	2017-03-11 10:46:00	402.549	Yuma St & Tenley Circle NW	Connecticut Ave & Yuma St NW	Subscriber
1330037	2017-05-30 01:02:59	2017-05-30 01:13:37	637.251	17th St & Massachusetts Ave NW	5th & K St NW	Subscriber
665458	2017-04-02 07:48:35	2017-04-02 08:19:03	1827.341	Constitution Ave & 2nd St NW/DOL	M St & Pennsylvania Ave NW	Customer
1481135	2017-06-10 08:36:28	2017-06-10 09:02:17	1549.427	Henry Bacon Dr & Lincoln Memorial Circle NW	Maine Ave & 7th St SW	Subscriber
1148202	2017-05-14 07:18:18	2017-05-14 07:24:56	398.000	1st & K St SE	Eastern Market Metro / Pennsylvania Ave & 7th St SE	Subscriber

X	Start.Time	End.Time	Trip.Duration	Start.Station	End.Station	User.Type	Gender	Birth.Year
1423854	2017-06-23 15:09:32	2017-06-23 15:14:53	321	Wood St & Hubbard St	Damen Ave & Chicago Ave	Subscriber	Male	1992
955915	2017-05-25 18:19:03	2017-05-25 18:45:53	1610	Theater on the Lake	Sheffield Ave & Waveland Ave	Subscriber	Female	1992
9031	2017-01-04 08:27:49	2017-01-04 08:34:45	416	May St & Taylor St	Wood St & Taylor St	Subscriber	Male	1981
304487	2017-03-06 13:49:38	2017-03-06 13:55:28	350	Christiana Ave & Lawrence Ave	St. Louis Ave & Balmoral Ave	Subscriber	Male	1986
45207	2017-01-17 14:53:07	2017-01-17 15:02:01	534	Clark St & Randolph St	Desplaines St & Jackson Blvd	Subscriber	Male	1975
1473887	2017-06-26 09:01:20	2017-06-26 09:11:06	586	Clinton St & Washington Blvd	Canal St & Taylor St	Subscriber	Male	1990

X	Start.Time	End.Time	Trip.Duration	Start.Station	End.Station	User.Type	Gender	Birth.Year	month
5688089	2017-06-11 14:55:05	2017-06-11 15:08:21	795	Suffolk St & Stanton St	W Broadway & Spring St	Subscriber	Male	1998	06
4096714	2017-05-11 15:30:11	2017-05-11 15:41:43	692	Lexington Ave & E 63 St	1 Ave & E 78 St	Subscriber	Male	1981	05
2173887	2017-03-29 13:26:26	2017-03-29 13:48:31	1325	1 Pl & Clinton St	Henry St & Degraw St	Subscriber	Male	1987	03
3945638	2017-05-08 19:47:18	2017-05-08 19:59:01	703	Barrow St & Hudson St	W 20 St & 8 Ave	Subscriber	Female	1986	05
6208972	2017-06-21 07:49:16	2017-06-21 07:54:46	329	1 Ave & E 44 St	E 53 St & 3 Ave	Subscriber	Male	1992	06
1285652	2017-02-22 18:55:24	2017-02-22 19:12:03	998	State St & Smith St	Bond St & Fulton St	Subscriber	Male	1986	02

X	Start.Time	End.Time	Trip.Duration	Start.Station	End.Station	User.Type	month
1621326	2017-06-21 08:36:34	2017-06-21 08:44:43	489.066	14th & Belmont St NW	15th & K St NW	Subscriber	06
482740	2017-03-11 10:40:00	2017-03-11 10:46:00	402.549	Yuma St & Tenley Circle NW	Connecticut Ave & Yuma St NW	Subscriber	03
1330037	2017-05-30 01:02:59	2017-05-30 01:13:37	637.251	17th St & Massachusetts Ave NW	5th & K St NW	Subscriber	05
665458	2017-04-02 07:48:35	2017-04-02 08:19:03	1827.341	Constitution Ave & 2nd St NW/DOL	M St & Pennsylvania Ave NW	Customer	04
1481135	2017-06-10 08:36:28	2017-06-10 09:02:17	1549.427	Henry Bacon Dr & Lincoln Memorial Circle NW	Maine Ave & 7th St SW	Subscriber	06
1148202	2017-05-14 07:18:18	2017-05-14 07:24:56	398.000	1st & K St SE	Eastern Market Metro / Pennsylvania Ave & 7th St SE	Subscriber	05

Explore Bike Share Data - John Bailey¶

Question 1¶

New York¶

Washington¶

Chicago¶

Question 2¶

New York¶

Washington¶

Chicago¶

Question 3¶

New York¶

Chicago¶

Finishing Up¶