Dr. Sarah Chen

Associate Professor of Computer Science

Stanford University

My research focuses on natural language processing, machine learning, and the intersection of language understanding with knowledge representation.

📍 Stanford, CA

🏛 Stanford University

✉ Email 🎓 Google Scholar 🆔 ORCID 💻 GitHub

← Back to Publications

Faithful Chain-of-Thought Reasoning via Semantic Entailment Verification

Published in Transactions of the ACL, 2026

Sarah Chen, Jiwon Park, Miguel Rodriguez, Ananya Gupta

Abstract

Chain-of-thought (CoT) prompting has emerged as a powerful technique for eliciting multi-step reasoning from large language models. However, the intermediate reasoning steps generated by these models frequently contain logical errors, unsupported leaps, and hallucinated facts that are difficult to detect from surface-level inspection alone. This disconnect between apparent fluency and actual faithfulness undermines the reliability of CoT reasoning in high-stakes applications.

We propose a verification framework that decomposes each chain-of-thought into atomic reasoning steps and validates them against a semantic entailment graph constructed from the source context. Our approach introduces three key innovations: (1) a step-level decomposition algorithm that segments free-form reasoning chains into verifiable units, (2) a lightweight entailment classifier trained on synthetic step-level supervision, and (3) a graph-based consistency checker that identifies contradictions and unsupported claims across the full reasoning chain.

Experiments on three multi-hop question answering benchmarks (HotpotQA, MuSiQue, and 2WikiMultiHop) show that our verification framework improves faithfulness by 34% while maintaining generation fluency. When used as a reranker over multiple sampled reasoning paths, our method further improves downstream QA accuracy by 8.2% on average. We also demonstrate that our step-level entailment scores provide interpretable explanations of where and why reasoning chains fail, enabling targeted debugging of model outputs.

Citation

S. Chen, J. Park, M. Rodriguez, A. Gupta. (2026). "Faithful Chain-of-Thought Reasoning via Semantic Entailment Verification." Transactions of the ACL, 14, 312–329.

Download Paper | Download Slides | BibTeX | Code