.small.hi-slate[Cross-Lingual Stance Detection in Political Texts]

class: center, middle, inverse, title-slide

.title[
# .small.hi-slate[Cross-Lingual Stance Detection in Political Texts]
]
.subtitle[
## .small.hi-green[Comparison and Application]
]
.author[
### .small.b[<strong>Yen-Chieh Liao (University of Birmingham)</strong>]
]
.author[
### .small.b[<strong>Stefan Müller (University College Dublin)</strong>]
]
.date[
### .small.b[March 14 2026]
]

---

exclude: true

---
layout: true
# .tiny[ Motivation: Why Cross-Lingual Stance? ? The Scale Problem]
---
name:overview

.pull-left[
.huge.hi-grey[**The Core Problem**]
- Human-sourced annotation is the gold standard… but it does not scale



]

.pull-right[

.huge.hi-grey[**Research Questions**]

]

---

.pull-left[
.huge.hi-grey[**The Core Problem**]
- Human-sourced annotation is the gold standard… but it does not scale
> Benoit et al. (2016): annotating a single EP debate across 6 languages cost **$43–$109 per language version**


]

.pull-right[

.huge.hi-grey[**Research Questions**]

]

---

.pull-left[
.huge.hi-grey[**The Core Problem**]
- Human-sourced annotation is the gold standard… but it does not scale
> Benoit et al. (2016): annotating a single EP debate across 6 languages cost **$43–$109 per language version**
- At corpus scale (e.g., ParlSpeech V2: **6.3M speeches**, 9 parliaments), manual annotation is infeasible

]

.pull-right[

.huge.hi-grey[**Research Questions**]

]

---

.pull-left[
.huge.hi-grey[**The Core Problem**]
- Human-sourced annotation is the gold standard… but it does not scale
> Benoit et al. (2016): annotating a single EP debate across 6 languages cost **$43–$109 per language version**
- At corpus scale (e.g., ParlSpeech V2: **6.3M speeches**, 9 parliaments), manual annotation is infeasible
- Fine-tuned multilingual transformers are the standard approach — but does fine-tuning transfer across domains?
]

.pull-right[

.huge.hi-grey[**Research Questions**]

]

---

.pull-left[
.huge.hi-grey[**The Core Problem**]
- Human-sourced annotation is the gold standard… but it does not scale
> Benoit et al. (2016): annotating a single EP debate across 6 languages cost **$43–$109 per language version**
- At corpus scale (e.g., ParlSpeech V2: **6.3M speeches**, 9 parliaments), manual annotation is infeasible
- Fine-tuned multilingual transformers are the standard approach — but does fine-tuning transfer across domains?
]

.pull-right[

.huge.hi-grey[**Research Questions**]

**RQ1** Do fine-tuned multilingual transformers 
transfer to out-of-domain parliamentary text 
across six languages?

**RQ2** Can small open-source LLMs recover
a valid stance signal — and does prompting
strategy (zero-shot, few-shot, RAG) matter?
]

---
layout: true
# .tiny[Dataset: MEPs Coal Mining Debates (Benoit et al. 2016)]
---
name: dataset

&nbsp;
&nbsp;

.pull-left[
.huge.hi-grey[**36 MEPs Crowd-Coded Corpus across 6 Languages**]
- .small[**Human-annotated**: +1 (support) to −1 (oppose), 3–5 coders per sentence via CrowdFlower]


]

.pull-right[
&nbsp;
&nbsp;
<img src="./figure/apsr.png" alt="" width="90%" style="display: block; margin: auto;" />
]

---
&nbsp;
&nbsp;
.pull-left[
.huge.hi-grey[**36 MEPs Crowd-Coded Corpus across 6 Languages**]
- .small[**Human-annotated**: +1 (support) to −1 (oppose), 3–5 coders per sentence via CrowdFlower]
- .small[**Real voting outcomes**: speech-level scores perfectly predict each MEP's roll-call vote (Benoit et al., 2016)]

]

.pull-right[
&nbsp;
&nbsp;
<img src="./figure/apsr2.png" alt="" width="90%" style="display: block; margin: auto;" />
]

---

&nbsp;
&nbsp;
.pull-left[
.huge.hi-grey[**36 MEPs Crowd-Coded Corpus across 6 Languages**]
- .small[**Human-annotated**: +1 (support) to −1 (oppose), 3–5 coders per sentence via CrowdFlower]
- .small[**Real voting outcomes**: speech-level scores perfectly predict each MEP's roll-call vote (Benoit et al., 2016)]
- .small[We focus on **sentence-level** performance against majority crowd-coded labels]
]

.pull-right[
&nbsp;
&nbsp;
<img src="./figure/apsr2.png" alt="" width="90%" style="display: block; margin: auto;" />
]

---

&nbsp;
&nbsp;
.pull-left[
.huge.hi-grey[**Background**: End subsidies by 2014 vs. extension to 2018+]
- .small[**Competing Interests**: Environmental goals vs. local employment]
- .small[**Ideological Divide**: Market mechanisms vs. government intervention]
- .small[**Contested political debate**]

]

.pull-right[
&nbsp;
&nbsp;
<img src="./figure/news.png" alt="" width="99%" style="display: block; margin: auto;" />
]

---
layout: true
# .tiny[Evaluation: Transformers vs Instruction-Tuned LLMs]
---
name: evaluation

&nbsp;
&nbsp;

.pull-left[
.huge.hi-grey[**Fine-Tuned Transformers**]
- 6 classifiers (mBERT · sBERT · XLM-RoBERTa × PoliStance / X-Stance), applied out-of-domain to the target corpus

]

.pull-right[

.huge.hi-grey[**Instruction-Tuned LLMs**]
- 3 lightweight open-source LLMs — reproducible, consumer-grade hardware, no commercial API
- Each model evaluated under Zero-Shot · Few-Shot · RAG, holding system prompt and label schema constant
]

---

&nbsp;
&nbsp;

.pull-left[
.huge.hi-grey[**Fine-Tuned Transformers**]
- 6 classifiers (mBERT · sBERT · XLM-RoBERTa × PoliStance / X-Stance), applied <u>.hi-blue[out-of-domain]</u> to the target corpus

]

.pull-right[

.huge.hi-grey[**Instruction-Tuned LLMs**]
- 3 lightweight open-source LLMs — reproducible, consumer-grade hardware, no commercial API
- Each model evaluated under Zero-Shot · Few-Shot · RAG, holding system prompt and label schema constant
]

---

&nbsp;
&nbsp;

.pull-left[
.huge.hi-grey[**Fine-Tuned Transformers**]
- 6 classifiers (mBERT · sBERT · XLM-RoBERTa × PoliStance / X-Stance), applied <u>.hi-blue[out-of-domain]</u> to the target corpus

]

.pull-right[

.huge.hi-grey[**Instruction-Tuned LLMs**]
- 3 <u>.hi-blue[lightweight open-source LLMs]</u> — reproducible, consumer-grade hardware, no commercial API
- Each model evaluated under Zero-Shot · Few-Shot · RAG, holding system prompt and label schema constant
]

---

&nbsp;
&nbsp;

.pull-left[
.huge.hi-grey[**Fine-Tuned Transformers**]
- 6 classifiers (mBERT · sBERT · XLM-RoBERTa × PoliStance / X-Stance), applied <u>.hi-blue[out-of-domain]</u> to the target corpus

]

.pull-right[

.huge.hi-grey[**Instruction-Tuned LLMs**]
- 3 <u>.hi-blue[lightweight open-source LLMs]</u> — reproducible, consumer-grade hardware, no commercial API
- Each model evaluated under <u>.hi-blue[Zero-Shot · Few-Shot · RAG]</u>, holding system prompt and label schema constant
]

---
layout: true
# .tiny[Three Prompting Strategies]
---
name:proof-of-concept-2

<img src="./figure/figure_a4.png" alt="" width="75%" style="display: block; margin: auto;" />

---

<img src="./figure/figure_a4_a.png" alt="" width="75%" style="display: block; margin: auto;" />

---

<img src="./figure/figure_a4_b.png" alt="" width="75%" style="display: block; margin: auto;" />

---

<img src="./figure/figure_a4_c.png" alt="" width="75%" style="display: block; margin: auto;" />

---

<img src="./figure/figure_a4_d.png" alt="" width="75%" style="display: block; margin: auto;" />

---

<img src="./figure/figure_a4_e.png" alt="" width="75%" style="display: block; margin: auto;" />

---

<img src="./figure/figure_a4_f.png" alt="" width="75%" style="display: block; margin: auto;" />

---
layout: true
# .tiny[Avg. Macro-F1 against CrowdFlower]
---
name:embedding-model

<img src="./figure/figure_a2.png" alt="" width="88%" style="display: block; margin: auto;" />

---

<img src="./figure/figure_a2_a.png" alt="" width="88%" style="display: block; margin: auto;" />

---

<img src="./figure/figure_a2_b.png" alt="" width="88%" style="display: block; margin: auto;" />

---

<img src="./figure/figure_a2_c.png" alt="" width="88%" style="display: block; margin: auto;" />

---

<img src="./figure/figure_2.png" alt="" width="88%" style="display: block; margin: auto;" />

---
layout: true
# .tiny[Avg Macro-F1 against CrowdFlower by Language Versions]
---
name:embedding-model

<img src="./figure/figure_3a.png" alt="" width="75%" style="display: block; margin: auto;" />

---

<img src="./figure/figure_3a_1.png" alt="" width="75%" style="display: block; margin: auto;" />

---

<img src="./figure/figure_3a_2.png" alt="" width="75%" style="display: block; margin: auto;" />

---

<img src="./figure/figure_3a_3.png" alt="" width="75%" style="display: block; margin: auto;" />

---
layout: true
# .tiny[Comparing DEBATE NLI from Burnham et al. (2026)]
---
name:embedding-model

.pull-left[
<img src="./figure/figure_a3.png" alt="" width="100%" style="display: block; margin: auto;" />

]

.pull-right[
<img src="./figure/PA.png" alt="" width="90%" style="display: block; margin: auto;" />
]

---

<img src="./figure/figure_a3_1.png" alt="" width="70%" style="display: block; margin: auto;" />

---
layout: true
# .tiny[Average Macro-F1 Against Vote Decision]
---

<img src="./figure/figure_1.png" alt="" width="60%" style="display: block; margin: auto;" />

---
layout: true
# .tiny[Runtime]
---
name:embedding-model

<img src="./figure/table_a6_a.png" alt="" width="100%" style="display: block; margin: auto;" />

---

<img src="./figure/table_a6_c.png" alt="" width="100%" style="display: block; margin: auto;" />

---

<img src="./figure/table_a6_d.png" alt="" width="100%" style="display: block; margin: auto;" />

---
layout: true
# .tiny[To Take Away]
---

&nbsp;
&nbsp;

- Few-shot and RAG prompting outperform zero-shot baselines; small open-source models like Qwen 2.5 7B with RAG achieve competitive performance without expensive APIs or fine-tuning.

--

- Transformer errors stem more from out-of-domain mismatch.

--

- These findings are preliminary; researchers should still validate LLM annotations with human coding, but multilingual annotation will only get easier as models improve and become more accessible.

<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css">

<style>
.social-link {
  color: #ffffff;
  text-decoration: none;
  transition: color 0.3s ease;
}
.social-link:hover {
  opacity: 0.7;
}
</style>

---
layout: true
class: inverse, center, middle

# Thank You

<br><br>

---

---
layout: true
class: inverse, center, middle

# Appendix

<br><br>

---

---
layout: true
# .tiny[MEP-level Cross-lingual Stance Score]
---
name:embedding-model

<img src="./figure/figure_a1.png" alt="" width="75%" style="display: block; margin: auto;" />

---
layout: true
# .tiny[LLM Performance Across Prompting Strategies]
---
name:embedding-model

<img src="./figure/figure_3.png" alt="" width="75%" style="display: block; margin: auto;" />

---
layout: true
# .tiny[Model Errors Sensitive to CrowdFlower’s Annotation Ambguity?]
---
name:embedding-model

<img src="./figure/figure_4_b.png" alt="" width="95%" style="display: block; margin: auto;" />