Dr. Sarah Chen
Associate Professor of Computer Science
Stanford University
My research focuses on natural language processing, machine learning, and the intersection of language understanding with knowledge representation.
Faithful Chain-of-Thought Reasoning via Semantic Entailment Verification
Published in Transactions of the ACL, 2026
Sarah Chen, Jiwon Park, Miguel Rodriguez, Ananya Gupta
Abstract
Chain-of-thought (CoT) prompting has emerged as a powerful technique for eliciting multi-step reasoning from large language models. However, the intermediate reasoning steps generated by these models frequently contain logical errors, unsupported leaps, and hallucinated facts that are difficult to detect from surface-level inspection alone. This disconnect between apparent fluency and actual faithfulness undermines the reliability of CoT reasoning in high-stakes applications.
We propose a verification framework that decomposes each chain-of-thought into atomic reasoning steps and validates them against a semantic entailment graph constructed from the source context. Our approach introduces three key innovations: (1) a step-level decomposition algorithm that segments free-form reasoning chains into verifiable units, (2) a lightweight entailment classifier trained on synthetic step-level supervision, and (3) a graph-based consistency checker that identifies contradictions and unsupported claims across the full reasoning chain.
Experiments on three multi-hop question answering benchmarks (HotpotQA, MuSiQue, and 2WikiMultiHop) show that our verification framework improves faithfulness by 34% while maintaining generation fluency. When used as a reranker over multiple sampled reasoning paths, our method further improves downstream QA accuracy by 8.2% on average. We also demonstrate that our step-level entailment scores provide interpretable explanations of where and why reasoning chains fail, enabling targeted debugging of model outputs.
Citation
S. Chen, J. Park, M. Rodriguez, A. Gupta. (2026). "Faithful Chain-of-Thought Reasoning via Semantic Entailment Verification." Transactions of the ACL, 14, 312–329.