class: center, middle, inverse, title-slide .title[ # .small.hi-slate[Cross-Lingual Stance Detection in Political Texts] ] .subtitle[ ## .small.hi-green[Comparison and Application] ] .author[ ### .small.b[
Yen-Chieh Liao (University of Birmingham)
] ] .author[ ### .small.b[
Stefan Müller (University College Dublin)
] ] .date[ ### .small.b[March 14 2026] ] --- exclude: true --- layout: true # .tiny[ Motivation: Why Cross-Lingual Stance? ? The Scale Problem] --- name:overview .pull-left[ .huge.hi-grey[**The Core Problem**] - Human-sourced annotation is the gold standard… but it does not scale <!-- > Benoit et al. (2016): annotating a single EP debate across 6 languages cost **$43–$109 per language version** --> <!-- - At corpus scale (e.g., ParlSpeech V2: **6.3M speeches**, 9 parliaments), manual annotation is infeasible --> <!-- - Fine-tuned multilingual transformers are the standard approach — but does fine-tuning transfer across domains? --> ] .pull-right[ .huge.hi-grey[**Research Questions**] <!-- **RQ1** Do fine-tuned multilingual transformers --> <!-- transfer to out-of-domain parliamentary text --> <!-- across six languages? --> <!-- **RQ2** Can small open-source LLMs recover --> <!-- a valid stance signal — and does prompting --> <!-- strategy (zero-shot, few-shot, RAG) matter? --> ] --- .pull-left[ .huge.hi-grey[**The Core Problem**] - Human-sourced annotation is the gold standard… but it does not scale > Benoit et al. (2016): annotating a single EP debate across 6 languages cost **$43–$109 per language version** <!-- - At corpus scale (e.g., ParlSpeech V2: **6.3M speeches**, 9 parliaments), manual annotation is infeasible --> <!-- - Fine-tuned multilingual transformers are the standard approach — but does fine-tuning transfer across domains? --> ] .pull-right[ .huge.hi-grey[**Research Questions**] <!-- **RQ1** Do fine-tuned multilingual transformers --> <!-- transfer to out-of-domain parliamentary text --> <!-- across six languages? --> <!-- **RQ2** Can small open-source LLMs recover --> <!-- a valid stance signal — and does prompting --> <!-- strategy (zero-shot, few-shot, RAG) matter? --> ] --- .pull-left[ .huge.hi-grey[**The Core Problem**] - Human-sourced annotation is the gold standard… but it does not scale > Benoit et al. (2016): annotating a single EP debate across 6 languages cost **$43–$109 per language version** - At corpus scale (e.g., ParlSpeech V2: **6.3M speeches**, 9 parliaments), manual annotation is infeasible <!-- - Fine-tuned multilingual transformers are the standard approach — but does fine-tuning transfer across domains? --> ] .pull-right[ .huge.hi-grey[**Research Questions**] <!-- **RQ1** Do fine-tuned multilingual transformers --> <!-- transfer to out-of-domain parliamentary text --> <!-- across six languages? --> <!-- **RQ2** Can small open-source LLMs recover --> <!-- a valid stance signal — and does prompting --> <!-- strategy (zero-shot, few-shot, RAG) matter? --> ] --- .pull-left[ .huge.hi-grey[**The Core Problem**] - Human-sourced annotation is the gold standard… but it does not scale > Benoit et al. (2016): annotating a single EP debate across 6 languages cost **$43–$109 per language version** - At corpus scale (e.g., ParlSpeech V2: **6.3M speeches**, 9 parliaments), manual annotation is infeasible - Fine-tuned multilingual transformers are the standard approach — but does fine-tuning transfer across domains? ] .pull-right[ .huge.hi-grey[**Research Questions**] <!-- **RQ1** Do fine-tuned multilingual transformers --> <!-- transfer to out-of-domain parliamentary text --> <!-- across six languages? --> <!-- **RQ2** Can small open-source LLMs recover --> <!-- a valid stance signal — and does prompting --> <!-- strategy (zero-shot, few-shot, RAG) matter? --> ] --- .pull-left[ .huge.hi-grey[**The Core Problem**] - Human-sourced annotation is the gold standard… but it does not scale > Benoit et al. (2016): annotating a single EP debate across 6 languages cost **$43–$109 per language version** - At corpus scale (e.g., ParlSpeech V2: **6.3M speeches**, 9 parliaments), manual annotation is infeasible - Fine-tuned multilingual transformers are the standard approach — but does fine-tuning transfer across domains? ] .pull-right[ .huge.hi-grey[**Research Questions**] **RQ1** Do fine-tuned multilingual transformers transfer to out-of-domain parliamentary text across six languages? **RQ2** Can small open-source LLMs recover a valid stance signal — and does prompting strategy (zero-shot, few-shot, RAG) matter? ] --- layout: true # .tiny[Dataset: MEPs Coal Mining Debates (Benoit et al. 2016)] --- name: dataset .pull-left[ .huge.hi-grey[**36 MEPs Crowd-Coded Corpus across 6 Languages**] - .small[**Human-annotated**: +1 (support) to −1 (oppose), 3–5 coders per sentence via CrowdFlower] <!-- - .small[**Real voting outcomes**: speech-level scores perfectly predict each MEP's roll-call vote (Benoit et al., 2016)] --> <!-- - .small[We focus on **sentence-level** performance against majority crowd-coded labels] --> ] .pull-right[ <img src="./figure/apsr.png" alt="" width="90%" style="display: block; margin: auto;" /> ] --- .pull-left[ .huge.hi-grey[**36 MEPs Crowd-Coded Corpus across 6 Languages**] - .small[**Human-annotated**: +1 (support) to −1 (oppose), 3–5 coders per sentence via CrowdFlower] - .small[**Real voting outcomes**: speech-level scores perfectly predict each MEP's roll-call vote (Benoit et al., 2016)] <!-- - .small[We focus on **sentence-level** performance against majority crowd-coded labels] --> ] .pull-right[ <img src="./figure/apsr2.png" alt="" width="90%" style="display: block; margin: auto;" /> ] --- .pull-left[ .huge.hi-grey[**36 MEPs Crowd-Coded Corpus across 6 Languages**] - .small[**Human-annotated**: +1 (support) to −1 (oppose), 3–5 coders per sentence via CrowdFlower] - .small[**Real voting outcomes**: speech-level scores perfectly predict each MEP's roll-call vote (Benoit et al., 2016)] - .small[We focus on **sentence-level** performance against majority crowd-coded labels] ] .pull-right[ <img src="./figure/apsr2.png" alt="" width="90%" style="display: block; margin: auto;" /> ] --- .pull-left[ .huge.hi-grey[**Background**: End subsidies by 2014 vs. extension to 2018+] - .small[**Competing Interests**: Environmental goals vs. local employment] - .small[**Ideological Divide**: Market mechanisms vs. government intervention] - .small[**Contested political debate**] ] .pull-right[ <img src="./figure/news.png" alt="" width="99%" style="display: block; margin: auto;" /> ] --- layout: true # .tiny[Evaluation: Transformers vs Instruction-Tuned LLMs] --- name: evaluation .pull-left[ .huge.hi-grey[**Fine-Tuned Transformers**] - 6 classifiers (mBERT · sBERT · XLM-RoBERTa × PoliStance / X-Stance), applied out-of-domain to the target corpus ] .pull-right[ .huge.hi-grey[**Instruction-Tuned LLMs**] - 3 lightweight open-source LLMs — reproducible, consumer-grade hardware, no commercial API - Each model evaluated under Zero-Shot · Few-Shot · RAG, holding system prompt and label schema constant ] --- .pull-left[ .huge.hi-grey[**Fine-Tuned Transformers**] - 6 classifiers (mBERT · sBERT · XLM-RoBERTa × PoliStance / X-Stance), applied <u>.hi-blue[out-of-domain]</u> to the target corpus ] .pull-right[ .huge.hi-grey[**Instruction-Tuned LLMs**] - 3 lightweight open-source LLMs — reproducible, consumer-grade hardware, no commercial API - Each model evaluated under Zero-Shot · Few-Shot · RAG, holding system prompt and label schema constant ] --- .pull-left[ .huge.hi-grey[**Fine-Tuned Transformers**] - 6 classifiers (mBERT · sBERT · XLM-RoBERTa × PoliStance / X-Stance), applied <u>.hi-blue[out-of-domain]</u> to the target corpus ] .pull-right[ .huge.hi-grey[**Instruction-Tuned LLMs**] - 3 <u>.hi-blue[lightweight open-source LLMs]</u> — reproducible, consumer-grade hardware, no commercial API - Each model evaluated under Zero-Shot · Few-Shot · RAG, holding system prompt and label schema constant ] --- .pull-left[ .huge.hi-grey[**Fine-Tuned Transformers**] - 6 classifiers (mBERT · sBERT · XLM-RoBERTa × PoliStance / X-Stance), applied <u>.hi-blue[out-of-domain]</u> to the target corpus ] .pull-right[ .huge.hi-grey[**Instruction-Tuned LLMs**] - 3 <u>.hi-blue[lightweight open-source LLMs]</u> — reproducible, consumer-grade hardware, no commercial API - Each model evaluated under <u>.hi-blue[Zero-Shot · Few-Shot · RAG]</u>, holding system prompt and label schema constant ] --- layout: true # .tiny[Three Prompting Strategies] --- name:proof-of-concept-2 <img src="./figure/figure_a4.png" alt="" width="75%" style="display: block; margin: auto;" /> --- <img src="./figure/figure_a4_a.png" alt="" width="75%" style="display: block; margin: auto;" /> --- <img src="./figure/figure_a4_b.png" alt="" width="75%" style="display: block; margin: auto;" /> --- <img src="./figure/figure_a4_c.png" alt="" width="75%" style="display: block; margin: auto;" /> --- <img src="./figure/figure_a4_d.png" alt="" width="75%" style="display: block; margin: auto;" /> --- <img src="./figure/figure_a4_e.png" alt="" width="75%" style="display: block; margin: auto;" /> --- <img src="./figure/figure_a4_f.png" alt="" width="75%" style="display: block; margin: auto;" /> --- layout: true # .tiny[Avg. Macro-F1 against CrowdFlower] --- name:embedding-model <img src="./figure/figure_a2.png" alt="" width="88%" style="display: block; margin: auto;" /> --- <img src="./figure/figure_a2_a.png" alt="" width="88%" style="display: block; margin: auto;" /> --- <img src="./figure/figure_a2_b.png" alt="" width="88%" style="display: block; margin: auto;" /> --- <img src="./figure/figure_a2_c.png" alt="" width="88%" style="display: block; margin: auto;" /> --- <img src="./figure/figure_2.png" alt="" width="88%" style="display: block; margin: auto;" /> --- layout: true # .tiny[Avg Macro-F1 against CrowdFlower by Language Versions] --- name:embedding-model <img src="./figure/figure_3a.png" alt="" width="75%" style="display: block; margin: auto;" /> --- <img src="./figure/figure_3a_1.png" alt="" width="75%" style="display: block; margin: auto;" /> --- <img src="./figure/figure_3a_2.png" alt="" width="75%" style="display: block; margin: auto;" /> --- <img src="./figure/figure_3a_3.png" alt="" width="75%" style="display: block; margin: auto;" /> --- layout: true # .tiny[Comparing DEBATE NLI from Burnham et al. (2026)] --- name:embedding-model .pull-left[ <img src="./figure/figure_a3.png" alt="" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="./figure/PA.png" alt="" width="90%" style="display: block; margin: auto;" /> ] --- <img src="./figure/figure_a3_1.png" alt="" width="70%" style="display: block; margin: auto;" /> --- layout: true # .tiny[Average Macro-F1 Against Vote Decision] --- <img src="./figure/figure_1.png" alt="" width="60%" style="display: block; margin: auto;" /> --- layout: true # .tiny[Runtime] --- name:embedding-model <img src="./figure/table_a6_a.png" alt="" width="100%" style="display: block; margin: auto;" /> --- <img src="./figure/table_a6_c.png" alt="" width="100%" style="display: block; margin: auto;" /> --- <img src="./figure/table_a6_d.png" alt="" width="100%" style="display: block; margin: auto;" /> --- layout: true # .tiny[To Take Away] --- - Few-shot and RAG prompting outperform zero-shot baselines; small open-source models like Qwen 2.5 7B with RAG achieve competitive performance without expensive APIs or fine-tuning. -- - Transformer errors stem more from out-of-domain mismatch. -- - These findings are preliminary; researchers should still validate LLM annotations with human coding, but multilingual annotation will only get easier as models improve and become more accessible. <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css"> <style> .social-link { color: #ffffff; text-decoration: none; transition: color 0.3s ease; } .social-link:hover { opacity: 0.7; } </style> --- layout: true class: inverse, center, middle # Thank You <br><br> <!-- <div style="font-size: 24px; display: flex; justify-content: center; gap: 40px; align-items: center;"> --> <!-- <a href="https://davidycliao.github.io" target="_blank" class="social-link"> --> <!-- <i class="fas fa-globe"></i> davidycliao.github.io --> <!-- </a> --> <!-- <a href="https://github.com/davidycliao" target="_blank" class="social-link"> --> <!-- <i class="fab fa-github"></i> GitHub --> <!-- </a> --> <!-- <a href="https://x.com/liaoyenchieh" target="_blank" class="social-link"> --> <!-- <i class="fab fa-twitter"></i> @liaoyenchieh --> <!-- </a> --> <!-- <a href="https://www.linkedin.com/in/davidycliao/" target="_blank" class="social-link"> --> <!-- <i class="fab fa-linkedin"></i> LinkedIn --> <!-- </a> --> <!-- </div> --> --- --- layout: true class: inverse, center, middle # Appendix <br><br> <!-- <div style="font-size: 24px; display: flex; justify-content: center; gap: 40px; align-items: center;"> --> <!-- <a href="https://davidycliao.github.io" target="_blank" class="social-link"> --> <!-- <i class="fas fa-globe"></i> davidycliao.github.io --> <!-- </a> --> <!-- <a href="https://github.com/davidycliao" target="_blank" class="social-link"> --> <!-- <i class="fab fa-github"></i> GitHub --> <!-- </a> --> <!-- <a href="https://x.com/liaoyenchieh" target="_blank" class="social-link"> --> <!-- <i class="fab fa-twitter"></i> @liaoyenchieh --> <!-- </a> --> <!-- <a href="https://www.linkedin.com/in/davidycliao/" target="_blank" class="social-link"> --> <!-- <i class="fab fa-linkedin"></i> LinkedIn --> <!-- </a> --> <!-- </div> --> --- --- layout: true # .tiny[MEP-level Cross-lingual Stance Score] --- name:embedding-model <img src="./figure/figure_a1.png" alt="" width="75%" style="display: block; margin: auto;" /> --- layout: true # .tiny[LLM Performance Across Prompting Strategies] --- name:embedding-model <img src="./figure/figure_3.png" alt="" width="75%" style="display: block; margin: auto;" /> --- layout: true # .tiny[Model Errors Sensitive to CrowdFlower’s Annotation Ambguity?] --- name:embedding-model <img src="./figure/figure_4_b.png" alt="" width="95%" style="display: block; margin: auto;" />