class: center, middle, inverse, title-slide # Practice for data.table ## with 7 questions ### 2023-10-27 --- # Before starting... * You can write your own answer in each question * Don't forget to import library data.table ```r library(data.table) ``` * We will use flights data in **nycflights13** library and remove NA ```r library(nycflights13) data("flights") flights <- as.data.table(flights) |> na.exclude() ``` --- # Question 1 ---- Get all the flights in the May 15th .panelset[ .panel[.panel-name[Your Answer] .can-edit[ ]] .panel[.panel-name[Our Answer] ```r ans <- flights[month == 5 & day == 15] ``` ]] --- # Question 2 ---- Sort flights first by column 'month' in ascending order, and then by 'day' in decending order .panelset[ .panel[.panel-name[Your Answer] .can-edit[ ]] .panel[.panel-name[Our Answer] ```r ans <- flights[order(month, -day)] ``` ]] --- # Question 3 ---- Select both arr_delay and dep_delay columns .panelset[ .panel[.panel-name[Your Answer] .can-edit[ ]] .panel[.panel-name[Our Answer] ```r ans <- flights[, .(arr_delay, dep_delay)] head(ans) ``` ``` ## arr_delay dep_delay ## 1: 11 2 ## 2: 20 4 ## 3: 33 2 ## 4: -18 -1 ## 5: -25 -6 ## 6: 12 -4 ``` ]] --- # Question 4 ---- How many trips have had total delay(arr_delay + dep_delay) > 0? .panelset[ .panel[.panel-name[Your Answer] .can-edit[ ]] .panel[.panel-name[Our Answer] ```r ans <- flights[, sum( (arr_delay + dep_delay) > 0 )] ans ``` ``` ## [1] 135059 ``` ]] --- # Question 5 ---- Calculate the average arrival and departure delay for all flights in the may 15th .panelset[ .panel[.panel-name[Your Answer] .can-edit[ ]] .panel[.panel-name[Our Answer] ```r ans <- flights[month == 5 & day == 15, .(m_arr = mean(arr_delay), m_dep = mean(dep_delay))] ans ``` ``` ## m_arr m_dep ## 1: -2.029598 9.809725 ``` ]] --- # Question 6 ---- How can get the number of trips corresponding to each origin airport? .panelset[ .panel[.panel-name[Your Answer] .can-edit[ ]] .panel[.panel-name[Our Answer] ```r ans <- flights[, .(.N), by = .(origin)] ans ``` ``` ## origin N ## 1: EWR 117127 ## 2: LGA 101140 ## 3: JFK 109079 ``` ]] --- # Question 7 ---- How can we get the average arrival and departure delay for each origin and month? .panelset[ .panel[.panel-name[Your Answer] .can-edit[ ]] .panel[.panel-name[Our Answer] ```r ans <- flights[, .(mean(arr_delay), mean(dep_delay)), by = .(origin, month)] ``` ]] --- class:inverse, center, middle # Thank you ----