Introduction

This UG (Year 3) module builds on the quantitative modules of Year 1 (Probability, Statistics and Modelling I) and Year 2 (Probability, Statistics and Modelling II), and introduces data science techniques as means for more sophisticated quantitative data analysis. This module aims to introduce students to computational methods for crime science.

The techniques covered in this module will be of relevance to students undertaking their final year independent research project.

Dates & times

The module is running in Term 2, 2018/2019, from 7 January 2019 - 29 March 2019.

  • Lectures: Mondays, 1 - 3pm.
  • Tutorials: Tuesdays (in even-numbered academic weeks), 11am - 1pm.

UCL timetable page: https://timetable.ucl.ac.uk/tt/createCustomTimet.do#

Contact & resources

The moodle page will accompany this module here.

Q&A forum: if you have a questions/problem that affects not just you or that you feel others would be interested in too, then please use the Q&A forum.

Learning outcomes

Upon successful completion of this module, you will be able to:

  • demonstrate knowledge of a broader range of analytical techniques used in the field of Security and Crime Science
  • understand the purpose, advantages and disadvantages of different forms of data science techniques
  • perform data science analyses on crime and/or-security related issues
  • apply the data science pipeline on crime and/or-security related issues
  • interpret and effectively report the results of said techniques

Structure

The general structure of this module is as follows: you will learn about data science approaches to crime and security problems in the lectures. These will cover the approaches on a conceptual level (what do they do?) and functional level (how do they work?). The tutorials are practical sessions in which you will learn how to implement the techniques of the previous lectures in the R programming language. During the tutorials, we will be there to assist you and help you. As a preparation for each tutorial, you will be given homework that helps you consolidate the concepts and build your R skills portfolio.

Timetable

Week Date Type Topic
20 7 Jan Lecture Intro to Data Science for crime scientists
20 8 Jan Tutorial How to solve data science problems
21 14 Jan Lecture Web scraping I
22 21 Jan Lecture Web scraping II
22 22 Jan Tutorial Tutorial: web scraping with R
23 28 Jan Lecture Text data I
24 4 Feb Lecture Text data II
24 5 Feb Tutorial Tutorial: data cleaning and preprocessing and text mining in R
25 11 Feb NONE NONE
26 18 Feb Lecture Machine learning I
27 25 Feb Lecture Machine learning II
27 26 Feb Tutorial Tutorial: machine learning in R
28 4 Mar Lecture Applied predictive modelling
29 11 Mar Lecture Advanced, promises and problems of Data Science for crime science
29 12 Mar Tutorial Tutorial: the full data science pipeline in R
30 18 Mar EXAM Class test

Materials

Software

We will use the R programming language. All packages, required resources and tools needed are openly available and free to download to any computer. We encourage students to bring their own laptops to the tutorials so they can customise their work environment. However, we will have a computer cluster available where you can use the UCL computers.

Literature

We will provide background reading and literature for each week in advance.

Data

All datasets used are open-source and available without restrictions.

Content details

Week 1 (20): 7+8 Jan

Lecture: Intro to Data Science for crime scientists

Things we will cover:

  • What is data science?
  • Game-changers in crime science research
  • The current situation of data science for crime science
  • Problem-solving with data science techniques
  • 3 principles of applied data science
  • Outlook on the module and assessment

Tutorial: How to solve data science problems

This tutorial is an essential session for the module because it will show you ways to solve a data science/programming problem. You will encounter many bugs/errors/problems in your career when using data science techniques. This session will equip you with tools and techniques to become a problem-solver for data science issues in R.

Week 2 (21): 14 Jan

Lecture: Web scraping I

Things we will cover:

  • Getting data from the Internet
  • Types of web-scraping
  • Using APIs from Twitter and YouTube
  • Harnessing the ‘juicy’ data of the Internet
  • The basic structure of a webpage (HTML, CSS, Javascript)
  • Exploiting the Internet’s structure for web-scraping

No tutorial.

Week 3 (22): 21+22 Jan

Lecture: Web scraping II

Things we will cover:

  • Pro and con of web-scraping
  • What to do when there is no API
  • “Real” web-scraping the hard way
  • Advanced web-scraping for dynamic websites

Tutorial: Web scraping with R

You will learn how to (1) use Twitter’s API to access tweets of particular people/topics/dates, (2) build a custom web-scraper with R’s rvest package to access and download details of the FBI’s most wanted persons, and (3) write a web-scraping programme to download details on all missing persons in the UK.

Week 4 (23): 28 Jan

Lecture: Text data I

Things we will cover:

  • Why text data is everywhere and everything is text
  • Applications of text data to crime and security problems
  • How to get text data
  • Dealing with text data (considerations in text cleaning)
  • Levels of text data
  • Quantifying text data

No tutorial.

Week 5 (24): 4+5 Feb

Lecture: Text data II

Things we will cover:

  • Sentiment analysis
  • Dynamic sentiment analysis using linguistic trajectory analysis
  • Psycholinguistic analyses
  • Bag-of-words approaches

Tutorial: Data cleaning and preprocessing and text mining in R

You will learn how to (1) clean and process data using examples of vlog transcripts from “toxic” YouTubers, (2) build a bag-of-words representation of Tweets using different levels of resolution, (3) apply a prominent psycholinguistic approach to text data, and (4) build your own lexicon to measure custom-made linguistic constructs.

Peer-feedback: the lecture includes the peer-feedback session for the Applied Data Science Project.

Week 6 (26): 18 Feb

Lecture: Machine learning I

Things we will cover:

  • What is machine learning and how does it differ from statistical modelling
  • Types of machine learning
  • Feature engineering
  • Cross-validation
  • Stepwise guide through supervised machine learning (types, algorithms, case-study)

No tutorial.

Week 7 (27): 25+26 Feb

Lecture: Machine learning II

Things we will cover:

  • Stepwise guide through unsupervised machine learning (types, algorithms, case-study)
  • Performance metrics in machine learning
  • Considerations about generalisability and validation

Tutorial: Machine learning in R

You will learn how to (1) build, run, evaluate supervised machine learning models (classification + regression), (2) build, run, evaluate unsupervised machine learning (clustering + anomaly detection) models, (3) apply cross-validation methods, and (4) assess models with advanced performance metrics.

Week 8 (28): 4 Mar

Lecture: Applied predictive modelling

Things we will cover:

  • Case-studies of the data science workflow for crime science
  • Guide through all steps of the full data science pipeline for crime scientists (from web-scraping via text mining to machine learning models)
  • Guest talk (2nd half)

1-on-1 feedback: after the lecture, we will hold the 1-on-1 feedback sessions (to be scheduled individually), see below.

No tutorial.

Week 9 (29): 11+12 Mar

Lecture: Advanced, promises and problems of Data Science for crime science

Things we will cover:

  • Advanced data science techniques
  • The blurry boundary of data science and artificial intelligence
  • Ethical considerations of data science for crime scientists
  • Outlook on the future
  • The technology fallacy

Tutorial: The full data science pipeline in R

You will learn in two mini projects how to apply the full set of techniques learned in this module. One project will focus on web-scraping -> cleaning/preprocessing data -> building a machine learning model. The second project will in addition include text mining and various machine learning techniques.

Week 10 (30): 18 Mar

Class test

Assessment

Class test

  • Weight for final grade: 30%
  • Learning outcomes tested: (1) demonstrating knowledge of a broader range of analytical techniques used in the field of Security and Crime Science, (2) understanding the purpose, advantages and disadvantages of different forms of data science techniques, (3) interpreting the results of data science techniques.
  • Date: 18 March 2019, 1-3pm.

Details: This 1-hour closed-book test covers theoretical and conceptual aspects of the lectures and tutorials. You will be given 9 questions to which you are required to write a brief answer. The questions are a mix of multiple-choice and open questions.

Applied Data Science Project

  • Weight for final grade: 70%
  • Learning outcomes tested: (1) demonstrating knowledge of a broader range of analytical techniques used in the field of Security and Crime Science, (2) performing data science analyses on crime and/or-security related issues, (3) applying the data science pipeline on crime and/or-security related issues, (4) interpreting and effectively reporting the results of said techniques
  • Deadline: 29 March 2019.
  • Feedback deadlines: 2 Feb. for the peer-feedback, 2 March for the 1-on-1 feedback (see below)
  • Word count limit: 2000 words (excl. code supplement; do not exceed this word count limit!)

Feedback sessions: Since a full project is a major step in your data science skills career, we will hold two feedback sessions to help you in the process.

  1. Peer-feedback session: you will exchange an outline of your project idea (i.e. which problem do you want to address and how?) with a fellow student. The purpose of the peer feedback is to get an independent view on your project early in the process. We will provide templates for the feedback that you will give and receive. The peer-feedback session will be held at end of the “Text data II” lecture on 4 Feb. 2019.

  2. 1-on-1 feedback session: you will receive individualised feedback from both Bennett and Felix in a 1-on-1 session where we will help you with questions adn give you final advice to fine-tune your project. These sessions will take 10 minutes per student and will be held on 4 March 2019 after the lecture (slots to be arranged at the start of the module).

Details: This assessment is the capstone project of the module. It requires you to address a crime and security science research problem in the full data science workflow (e.g., obtaining the data, processing the data, modelling the data, building predictive models, reporting on the findings, interpreting the outcomes). You will write a brief report on your project (a template will be provided) and you have to submit the R code needed to reproduce your findings. After passing this assessment, you will have the demonstrated the skills to solve a problem using data science techniques.

Attendance requirement

We are obliged to record the attendance at all sessions (lectures and tutorials) and each student will have to attend at least 70% of the sessions to be able to pass the module. If you cannot attend for a good reason, please let the TA know about this well in advance. We strongly advise you to attend all sessions as this ease the assessment for you and will help you get the most out of this module.