Dr. Sarah Chen

Associate Professor of Computer Science

Stanford University

My research focuses on natural language processing, machine learning, and the intersection of language understanding with knowledge representation.

📍 Stanford, CA

🏛 Stanford University

✉ Email 🎓 Google Scholar 🆔 ORCID 💻 GitHub

← Back to Publications

Cross-Lingual Semantic Parsing with Minimal Supervision

Published in EMNLP, 2025 — Oral Presentation

Sarah Chen, Kwame Okafor, Tobias Müller

Abstract

Semantic parsing—the task of mapping natural language utterances to formal meaning representations—has seen dramatic improvements in English thanks to large pre-trained language models. However, extending these advances to the world's other 7,000+ languages remains a formidable challenge. Most languages lack the annotated training data required for supervised approaches, and even multilingual pre-trained models exhibit significant performance gaps on low-resource languages.

We present XSP-Transfer, a cross-lingual transfer method for semantic parsing that requires only 50 annotated examples in the target language. Our approach combines three techniques: (1) a language-agnostic meaning representation alignment objective that maps utterances from different languages into a shared semantic space, (2) a structure-aware code-switching augmentation strategy that generates synthetic training data by swapping aligned phrases between high- and low-resource languages, and (3) a confidence-based self-training loop that iteratively expands the target-language training set with high-confidence model predictions.

We evaluate XSP-Transfer on the Mschema2QA and MTOP benchmarks across 10 typologically diverse languages. With only 50 target-language examples, our method achieves 85% of the fully-supervised performance on average, and outperforms the previous best few-shot method by 14 points in exact-match accuracy. Ablation studies reveal that the alignment objective and code-switching augmentation contribute roughly equally to the gains, while self-training provides an additional 3–5 point improvement.

Citation

S. Chen, K. Okafor, T. Müller. (2025). "Cross-Lingual Semantic Parsing with Minimal Supervision." In Proceedings of EMNLP 2025.

Download Paper | Code | Download Slides | BibTeX