This UG (Year 3) module builds on the quantitative modules of Year 1 (Probability, Statistics and Modelling I) and Year 2 (Probability, Statistics and Modelling II), and introduces data science techniques as means for more sophisticated quantitative data analysis. This module aims to introduce students to computational methods for crime science.
The techniques covered in this module will be of relevance to students undertaking their final year independent research project.
The module is running in Term 2, 2018/2019, from 7 January 2019 - 29 March 2019.
UCL timetable page: https://timetable.ucl.ac.uk/tt/createCustomTimet.do#
The moodle page will accompany this module here.
Q&A forum: if you have a questions/problem that affects not just you or that you feel others would be interested in too, then please use the Q&A forum.
Upon successful completion of this module, you will be able to:
The general structure of this module is as follows: you will learn about data science approaches to crime and security problems in the lectures. These will cover the approaches on a conceptual level (what do they do?) and functional level (how do they work?). The tutorials are practical sessions in which you will learn how to implement the techniques of the previous lectures in the R programming language. During the tutorials, we will be there to assist you and help you. As a preparation for each tutorial, you will be given homework that helps you consolidate the concepts and build your R skills portfolio.
Week | Date | Type | Topic |
---|---|---|---|
20 | 7 Jan | Lecture | Intro to Data Science for crime scientists |
20 | 8 Jan | Tutorial | How to solve data science problems |
21 | 14 Jan | Lecture | Web scraping I |
22 | 21 Jan | Lecture | Web scraping II |
22 | 22 Jan | Tutorial | Tutorial: web scraping with R |
23 | 28 Jan | Lecture | Text data I |
24 | 4 Feb | Lecture | Text data II |
24 | 5 Feb | Tutorial | Tutorial: data cleaning and preprocessing and text mining in R |
25 | 11 Feb | NONE | NONE |
26 | 18 Feb | Lecture | Machine learning I |
27 | 25 Feb | Lecture | Machine learning II |
27 | 26 Feb | Tutorial | Tutorial: machine learning in R |
28 | 4 Mar | Lecture | Applied predictive modelling |
29 | 11 Mar | Lecture | Advanced, promises and problems of Data Science for crime science |
29 | 12 Mar | Tutorial | Tutorial: the full data science pipeline in R |
30 | 18 Mar | EXAM | Class test |
We will use the R programming language. All packages, required resources and tools needed are openly available and free to download to any computer. We encourage students to bring their own laptops to the tutorials so they can customise their work environment. However, we will have a computer cluster available where you can use the UCL computers.
We will provide background reading and literature for each week in advance.
All datasets used are open-source and available without restrictions.
Lecture: Intro to Data Science for crime scientists
Things we will cover:
Tutorial: How to solve data science problems
This tutorial is an essential session for the module because it will show you ways to solve a data science/programming problem. You will encounter many bugs/errors/problems in your career when using data science techniques. This session will equip you with tools and techniques to become a problem-solver for data science issues in R.
Lecture: Web scraping I
Things we will cover:
No tutorial.
Lecture: Web scraping II
Things we will cover:
Tutorial: Web scraping with R
You will learn how to (1) use Twitter’s API to access tweets of particular people/topics/dates, (2) build a custom web-scraper with R’s rvest
package to access and download details of the FBI’s most wanted persons, and (3) write a web-scraping programme to download details on all missing persons in the UK.
Lecture: Text data I
Things we will cover:
No tutorial.
Lecture: Text data II
Things we will cover:
Tutorial: Data cleaning and preprocessing and text mining in R
You will learn how to (1) clean and process data using examples of vlog transcripts from “toxic” YouTubers, (2) build a bag-of-words representation of Tweets using different levels of resolution, (3) apply a prominent psycholinguistic approach to text data, and (4) build your own lexicon to measure custom-made linguistic constructs.
Peer-feedback: the lecture includes the peer-feedback session for the Applied Data Science Project.
Lecture: Machine learning I
Things we will cover:
No tutorial.
Lecture: Machine learning II
Things we will cover:
Tutorial: Machine learning in R
You will learn how to (1) build, run, evaluate supervised machine learning models (classification + regression), (2) build, run, evaluate unsupervised machine learning (clustering + anomaly detection) models, (3) apply cross-validation methods, and (4) assess models with advanced performance metrics.
Lecture: Applied predictive modelling
Things we will cover:
1-on-1 feedback: after the lecture, we will hold the 1-on-1 feedback sessions (to be scheduled individually), see below.
No tutorial.
Lecture: Advanced, promises and problems of Data Science for crime science
Things we will cover:
Tutorial: The full data science pipeline in R
You will learn in two mini projects how to apply the full set of techniques learned in this module. One project will focus on web-scraping -> cleaning/preprocessing data -> building a machine learning model. The second project will in addition include text mining and various machine learning techniques.
Class test
Details: This 1-hour closed-book test covers theoretical and conceptual aspects of the lectures and tutorials. You will be given 9 questions to which you are required to write a brief answer. The questions are a mix of multiple-choice and open questions.
Feedback sessions: Since a full project is a major step in your data science skills career, we will hold two feedback sessions to help you in the process.
Peer-feedback session: you will exchange an outline of your project idea (i.e. which problem do you want to address and how?) with a fellow student. The purpose of the peer feedback is to get an independent view on your project early in the process. We will provide templates for the feedback that you will give and receive. The peer-feedback session will be held at end of the “Text data II” lecture on 4 Feb. 2019.
1-on-1 feedback session: you will receive individualised feedback from both Bennett and Felix in a 1-on-1 session where we will help you with questions adn give you final advice to fine-tune your project. These sessions will take 10 minutes per student and will be held on 4 March 2019 after the lecture (slots to be arranged at the start of the module).
Details: This assessment is the capstone project of the module. It requires you to address a crime and security science research problem in the full data science workflow (e.g., obtaining the data, processing the data, modelling the data, building predictive models, reporting on the findings, interpreting the outcomes). You will write a brief report on your project (a template will be provided) and you have to submit the R code needed to reproduce your findings. After passing this assessment, you will have the demonstrated the skills to solve a problem using data science techniques.
We are obliged to record the attendance at all sessions (lectures and tutorials) and each student will have to attend at least 70% of the sessions to be able to pass the module. If you cannot attend for a good reason, please let the TA know about this well in advance. We strongly advise you to attend all sessions as this ease the assessment for you and will help you get the most out of this module.