Dr. Sarah Chen profile photo

Dr. Sarah Chen

Associate Professor of Computer Science

Stanford University

My research focuses on natural language processing, machine learning, and the intersection of language understanding with knowledge representation.


📍 Stanford, CA
🏛 Stanford University
✉ Email🎓 Google Scholar🆔 ORCID💻 GitHub
← Back to Publications

Cross-Lingual Semantic Parsing with Minimal Supervision

Published in EMNLP, 2025 — Oral Presentation

Sarah Chen, Kwame Okafor, Tobias Müller

Abstract

Semantic parsing—the task of mapping natural language utterances to formal meaning representations—has seen dramatic improvements in English thanks to large pre-trained language models. However, extending these advances to the world's other 7,000+ languages remains a formidable challenge. Most languages lack the annotated training data required for supervised approaches, and even multilingual pre-trained models exhibit significant performance gaps on low-resource languages.

We present XSP-Transfer, a cross-lingual transfer method for semantic parsing that requires only 50 annotated examples in the target language. Our approach combines three techniques: (1) a language-agnostic meaning representation alignment objective that maps utterances from different languages into a shared semantic space, (2) a structure-aware code-switching augmentation strategy that generates synthetic training data by swapping aligned phrases between high- and low-resource languages, and (3) a confidence-based self-training loop that iteratively expands the target-language training set with high-confidence model predictions.

We evaluate XSP-Transfer on the Mschema2QA and MTOP benchmarks across 10 typologically diverse languages. With only 50 target-language examples, our method achieves 85% of the fully-supervised performance on average, and outperforms the previous best few-shot method by 14 points in exact-match accuracy. Ablation studies reveal that the alignment objective and code-switching augmentation contribute roughly equally to the gains, while self-training provides an additional 3–5 point improvement.

Citation

S. Chen, K. Okafor, T. Müller. (2025). "Cross-Lingual Semantic Parsing with Minimal Supervision." In Proceedings of EMNLP 2025.

Follow:

GitHub

© 2026 Sarah Chen. Powered by Sitelas.