Supervised Tasks and Adaptation
Introduction to NLP — MSc. DH EdC-PSL
November 11, 2025
Language Model
A Language Model (LM) estimates the probability of pieces of text. Given a sequence of text \(w_{1},w_{2},\cdots,w_{S}\), it answers the question:
What is \(P(w_{1},w_{2},\cdots,w_{S})\)?
How to compute \(P\)?
\(\to\) encode:
Goal
Learn patterns that map inputs (texts, sentences, etc.) to outputs (labels, categories, values) based on annotated examples.
What you need?
\(\to\) The model will ‘learn’ from examples (and corrections), improving its predictions through training.
Approach
Use precomputed text embeddings as numerical representations of texts, then train a standard ML model on top.
Approach
Adapt a pretrained transformer model end-to-end to a specific downstream task by updating its parameters through training on labeled data.
Two steps of BERT development (extracted and modified from (Alammar 2018)).
| Representation + ML | Fine-Tuning |
|---|---|
| Freeze embeddings, train small model | Adapt all model weights |
| Fast, lightweight | Higher compute cost |
| Works with few labels | Needs more data |
| Easier to interpret & reuse | Task-specific, less interpretable |
| Strong for exploratory DH tasks | Best for high-performance NLP |
Key point
Representation + ML: modular, interpretable, resource-light
Fine-Tuning: powerful, task-adaptive, but resource-intensive
Keep in mind
Goal
Assign each input text to one of a set of predefined categories or classes.
| Metric | Formula | Interpretation |
|---|---|---|
| Accuracy | \(\frac{TP + TN}{TP + TN + FP + FN}\) | › Overall proportion of correct predictions |
| Positive Predicted Value | \(\frac{TP}{TP + FP}\) | › How accurate the positive predictions are (=precision) |
| True Positive Rate | \(\frac{TP}{TP + FN}\) | › Coverage of actual positive sample (=recall) |
| F1-score | \(\frac{2\times\mathrm{PPV}\times\mathrm{TPR}}{\mathrm{PPV}+\mathrm{TPR}}\) | › Harmonic mean of precision and recall |
| False Positive Rate | \(\frac{FP}{FP+TN}\) | › Proportion of incorrect positive predictions |
| Predicted Positive Rate | \(\frac{TP+FP}{TP+TN+FP+FN}\) | › Proportion of samples predicted as positive |
… reinforcing structural discriminations, through:
(a) Number of days with targeted policing for drug crimes in areas flagged by PredPol analysis of Oakland police data. (b) Targeted policing for drug crimes, by race. (c) Estimated drug use by race (extracted from (Lum and Isaac 2016)).
CAF’s suspicion score (extracted from (QdN 2023)).
Three core concepts (Barocas, Hardt, and Narayanan 2023):
Fairness across groups
The model should perform similarly across groups.
\(\to\) compute performance metrics for differnt groups:
let \(p\) refer to privileged group and \(n\) to non-privileged group:
\(m_\mathrm{p}=\mathrm{score}(\mathrm{TP}_\mathrm{p},\mathrm{FP}_\mathrm{p},\mathrm{TN}_\mathrm{p},\mathrm{FN}_\mathrm{p}),\quad m_\mathrm{n}=\mathrm{score}(\mathrm{TP}_\mathrm{n},\mathrm{FP}_\mathrm{n},\mathrm{TN}_\mathrm{n},\mathrm{FN}_\mathrm{n})\)
\(\to\) compare them with a ratio. A “fair” model should ensure (RoT: \(\epsilon\approx0.8\)):
\[ \epsilon \leq \frac{m_\mathrm{n}}{m_\mathrm{p}} \leq 1/\epsilon\]
| Name | Metric | Concept | Interpretation (Wiśniewski and Biecek 2022) |
|---|---|---|---|
| Statistical Parity | PPR | Independence (equivalent) | whether both groups have equal likelihood of being predicted as positive |
| Equal Opportunity | TPR | Separation (relaxation) | likelihood of correctly recognising a positive is equal regardless of group |
| Predictive Equality | FPR | Separation (relaxation) | likelihood of incorrectly classifying a negative as positive is equal regardless of group |
| Predictive Parity | PPV | Sufficiency (relaxation) | whether positive are equally reliable across groups |
| Accuracy Difference | ACC | — | whether the model performance is consistent across groups |
| Metric | Formula | Interpretation |
|---|---|---|
| Predicted Positive Rate (PPR) | \(\frac{TP+FP}{TP+TN+FP+FN}\) | › Proportion of samples predicted as positive |
| True Positive Rate (TPR) | \(\frac{TP}{TP + FN}\) | › Coverage of actual positive sample (=recall) |
| False Positive Rate (FPR) | \(\frac{FP}{FP+TN}\) | › Proportion of incorrect positive predictions |
| Positive Predicted Value | \(\frac{TP}{TP + FP}\) | › How accurate the positive predictions are (=precision) |
| Accuracy | \(\frac{TP + TN}{TP + TN + FP + FN}\) | › Overall proportion of correct predictions |
Learning Patterns