ADVANCED CRIME ANALYSIS 2018/2019

Introduction

This UG (Year 3) module builds on the quantitative modules of Year 1 (Probability, Statistics and Modelling I) and Year 2 (Probability, Statistics and Modelling II), and introduces data science techniques as means for more sophisticated quantitative data analysis. This module aims to introduce students to computational methods for crime science.

The techniques covered in this module will be of relevance to students undertaking their final year independent research project.

Dates & times

The module is running in Term 2, 2018/2019, from 7 January 2019 - 29 March 2019.

Lectures: Mondays, 1 - 3pm.
Tutorials: Tuesdays (in even-numbered academic weeks), 11am - 1pm.

UCL timetable page: https://timetable.ucl.ac.uk/tt/createCustomTimet.do#

Contact & resources

Bennett Kleinberg, Assistant Professor in Data Science, bennett.kleinberg@ucl.ac.uk
TA: Felix Soldner, Doctoral researcher, felix.soldner.18@ucl.ac.uk

The moodle page will accompany this module here.

Q&A forum: if you have a questions/problem that affects not just you or that you feel others would be interested in too, then please use the Q&A forum.

Learning outcomes

Upon successful completion of this module, you will be able to:

demonstrate knowledge of a broader range of analytical techniques used in the field of Security and Crime Science
understand the purpose, advantages and disadvantages of different forms of data science techniques
perform data science analyses on crime and/or-security related issues
apply the data science pipeline on crime and/or-security related issues
interpret and effectively report the results of said techniques

Structure

The general structure of this module is as follows: you will learn about data science approaches to crime and security problems in the lectures. These will cover the approaches on a conceptual level (what do they do?) and functional level (how do they work?). The tutorials are practical sessions in which you will learn how to implement the techniques of the previous lectures in the R programming language. During the tutorials, we will be there to assist you and help you. As a preparation for each tutorial, you will be given homework that helps you consolidate the concepts and build your R skills portfolio.

Timetable

Week	Date	Type	Topic
20	7 Jan	Lecture	Intro to Data Science for crime scientists
20	8 Jan	Tutorial	How to solve data science problems
21	14 Jan	Lecture	Web scraping I
22	21 Jan	Lecture	Web scraping II
22	22 Jan	Tutorial	Tutorial: web scraping with R
23	28 Jan	Lecture	Text data I
24	4 Feb	Lecture	Text data II
24	5 Feb	Tutorial	Tutorial: data cleaning and preprocessing and text mining in R
25	11 Feb	NONE	NONE
26	18 Feb	Lecture	Machine learning I
27	25 Feb	Lecture	Machine learning II
27	26 Feb	Tutorial	Tutorial: machine learning in R
28	4 Mar	Lecture	Applied predictive modelling
29	11 Mar	Lecture	Advanced, promises and problems of Data Science for crime science
29	12 Mar	Tutorial	Tutorial: the full data science pipeline in R
30	18 Mar	EXAM	Class test

Materials

Software

We will use the R programming language. All packages, required resources and tools needed are openly available and free to download to any computer. We encourage students to bring their own laptops to the tutorials so they can customise their work environment. However, we will have a computer cluster available where you can use the UCL computers.

Literature

We will provide background reading and literature for each week in advance.

A resource that we will use throughout and recommend keeping as a go-to guide is Max Kuhn & Kjell Johnson’s book Applied Predictive Modelling (in R). You will find this book freely available to you as UCL student through UCL Library’s free e-book access to this book.
An excellent guide to using R for Data Science is the book with the same title written by Hadley Wickham & Garrett Grolemund, which published online.

Data

All datasets used are open-source and available without restrictions.

Content details

Week 1 (20): 7+8 Jan

Lecture: Intro to Data Science for crime scientists

Things we will cover:

What is data science?
Game-changers in crime science research
The current situation of data science for crime science
Problem-solving with data science techniques
3 principles of applied data science
Outlook on the module and assessment

Tutorial: How to solve data science problems

This tutorial is an essential session for the module because it will show you ways to solve a data science/programming problem. You will encounter many bugs/errors/problems in your career when using data science techniques. This session will equip you with tools and techniques to become a problem-solver for data science issues in R.

Week 2 (21): 14 Jan

Lecture: Web scraping I

Things we will cover:

Getting data from the Internet
Types of web-scraping
Using APIs from Twitter and YouTube
Harnessing the ‘juicy’ data of the Internet
The basic structure of a webpage (HTML, CSS, Javascript)
Exploiting the Internet’s structure for web-scraping

No tutorial.

Week 3 (22): 21+22 Jan

Lecture: Web scraping II

Things we will cover:

Pro and con of web-scraping
What to do when there is no API
“Real” web-scraping the hard way
Advanced web-scraping for dynamic websites

Tutorial: Web scraping with R

You will learn how to (1) use Twitter’s API to access tweets of particular people/topics/dates, (2) build a custom web-scraper with R’s rvest package to access and download details of the FBI’s most wanted persons, and (3) write a web-scraping programme to download details on all missing persons in the UK.

Week 4 (23): 28 Jan

Lecture: Text data I

Things we will cover:

Why text data is everywhere and everything is text
Applications of text data to crime and security problems
How to get text data
Dealing with text data (considerations in text cleaning)
Levels of text data
Quantifying text data

No tutorial.

Week 5 (24): 4+5 Feb

Lecture: Text data II

Things we will cover:

Sentiment analysis
Dynamic sentiment analysis using linguistic trajectory analysis
Psycholinguistic analyses
Bag-of-words approaches

Tutorial: Data cleaning and preprocessing and text mining in R

You will learn how to (1) clean and process data using examples of vlog transcripts from “toxic” YouTubers, (2) build a bag-of-words representation of Tweets using different levels of resolution, (3) apply a prominent psycholinguistic approach to text data, and (4) build your own lexicon to measure custom-made linguistic constructs.

Peer-feedback: the lecture includes the peer-feedback session for the Applied Data Science Project.

Week 6 (26): 18 Feb

Lecture: Machine learning I

Things we will cover:

What is machine learning and how does it differ from statistical modelling
Types of machine learning
Feature engineering
Cross-validation
Stepwise guide through supervised machine learning (types, algorithms, case-study)

No tutorial.

Week 7 (27): 25+26 Feb

Lecture: Machine learning II

Things we will cover:

Stepwise guide through unsupervised machine learning (types, algorithms, case-study)
Performance metrics in machine learning
Considerations about generalisability and validation

Tutorial: Machine learning in R

You will learn how to (1) build, run, evaluate supervised machine learning models (classification + regression), (2) build, run, evaluate unsupervised machine learning (clustering + anomaly detection) models, (3) apply cross-validation methods, and (4) assess models with advanced performance metrics.

Week 8 (28): 4 Mar

Lecture: Applied predictive modelling

Things we will cover:

Case-studies of the data science workflow for crime science
Guide through all steps of the full data science pipeline for crime scientists (from web-scraping via text mining to machine learning models)
Guest talk (2nd half)

1-on-1 feedback: after the lecture, we will hold the 1-on-1 feedback sessions (to be scheduled individually), see below.

No tutorial.

Week 9 (29): 11+12 Mar

Lecture: Advanced, promises and problems of Data Science for crime science

Things we will cover:

Advanced data science techniques
The blurry boundary of data science and artificial intelligence
Ethical considerations of data science for crime scientists
Outlook on the future
The technology fallacy

Tutorial: The full data science pipeline in R

You will learn in two mini projects how to apply the full set of techniques learned in this module. One project will focus on web-scraping -> cleaning/preprocessing data -> building a machine learning model. The second project will in addition include text mining and various machine learning techniques.

Week 10 (30): 18 Mar

Class test

Assessment

Class test

Weight for final grade: 30%
Learning outcomes tested: (1) demonstrating knowledge of a broader range of analytical techniques used in the field of Security and Crime Science, (2) understanding the purpose, advantages and disadvantages of different forms of data science techniques, (3) interpreting the results of data science techniques.
Date: 18 March 2019, 1-3pm.

Details: This 1-hour closed-book test covers theoretical and conceptual aspects of the lectures and tutorials. You will be given 9 questions to which you are required to write a brief answer. The questions are a mix of multiple-choice and open questions.

Applied Data Science Project

Weight for final grade: 70%
Learning outcomes tested: (1) demonstrating knowledge of a broader range of analytical techniques used in the field of Security and Crime Science, (2) performing data science analyses on crime and/or-security related issues, (3) applying the data science pipeline on crime and/or-security related issues, (4) interpreting and effectively reporting the results of said techniques
Deadline: 29 March 2019.
Feedback deadlines: 2 Feb. for the peer-feedback, 2 March for the 1-on-1 feedback (see below)
Word count limit: 2000 words (excl. code supplement; do not exceed this word count limit!)

Feedback sessions: Since a full project is a major step in your data science skills career, we will hold two feedback sessions to help you in the process.

Peer-feedback session: you will exchange an outline of your project idea (i.e. which problem do you want to address and how?) with a fellow student. The purpose of the peer feedback is to get an independent view on your project early in the process. We will provide templates for the feedback that you will give and receive. The peer-feedback session will be held at end of the “Text data II” lecture on 4 Feb. 2019.
1-on-1 feedback session: you will receive individualised feedback from both Bennett and Felix in a 1-on-1 session where we will help you with questions adn give you final advice to fine-tune your project. These sessions will take 10 minutes per student and will be held on 4 March 2019 after the lecture (slots to be arranged at the start of the module).

Details: This assessment is the capstone project of the module. It requires you to address a crime and security science research problem in the full data science workflow (e.g., obtaining the data, processing the data, modelling the data, building predictive models, reporting on the findings, interpreting the outcomes). You will write a brief report on your project (a template will be provided) and you have to submit the R code needed to reproduce your findings. After passing this assessment, you will have the demonstrated the skills to solve a problem using data science techniques.

Attendance requirement

We are obliged to record the attendance at all sessions (lectures and tutorials) and each student will have to attend at least 70% of the sessions to be able to pass the module. If you cannot attend for a good reason, please let the TA know about this well in advance. We strongly advise you to attend all sessions as this ease the assessment for you and will help you get the most out of this module.

ADVANCED CRIME ANALYSIS 2018/2019

Department of Security and Crime Science, UCL

Bennett Kleinberg (bennett.kleinberg@ucl.ac.uk)

13/12/2018

Introduction

Dates & times

Contact & resources

Learning outcomes

Structure

Timetable

Materials

Software

Literature

Data

Content details

Week 1 (20): 7+8 Jan

Week 2 (21): 14 Jan

Week 3 (22): 21+22 Jan

Week 4 (23): 28 Jan

Week 5 (24): 4+5 Feb

Week 6 (26): 18 Feb

Week 7 (27): 25+26 Feb

Week 8 (28): 4 Mar

Week 9 (29): 11+12 Mar

Week 10 (30): 18 Mar

Assessment

Class test

Applied Data Science Project

Attendance requirement