Dr. Sarah Chen

Associate Professor of Computer Science

Stanford University

My research focuses on natural language processing, machine learning, and the intersection of language understanding with knowledge representation.

📍 Stanford, CA

🏛 Stanford University

✉ Email 🎓 Google Scholar 🆔 ORCID 💻 GitHub

← Back to Teaching

CS 329T: Trustworthy Machine Learning

Spring 2025, Spring 2026 — Stanford University

CS 329T is a graduate seminar that tackles one of the most pressing challenges in modern AI: how do we build machine learning systems that are not only accurate but also robust, fair, interpretable, and safe? As ML models are deployed in high-stakes domains—healthcare, criminal justice, autonomous vehicles, financial lending—the consequences of model failures extend far beyond accuracy metrics. This course provides a rigorous framework for reasoning about the multifaceted nature of trustworthiness in ML.

The course is structured around weekly paper discussions, hands-on lab sessions, and a semester-long research project. Students read and critically analyze 2–3 papers per week drawn from the latest research in adversarial robustness, distribution shift, algorithmic fairness, interpretability, and AI alignment. Lab sessions provide practical experience with tools for auditing model behavior, including fairness toolkits, attribution methods, and adversarial attack libraries.

A distinguishing feature of CS 329T is its emphasis on real-world case studies. Each module begins with a documented ML failure—a biased hiring algorithm, a fragile medical imaging classifier, a manipulated content recommendation system—and uses it to motivate the technical material that follows. Students complete the course equipped not only with technical knowledge but with a practical methodology for evaluating and improving the trustworthiness of any ML system they encounter.

Topics Covered

Adversarial Robustness: Attacks, Defenses, and Certified Guarantees
Distribution Shift and Domain Adaptation
Algorithmic Fairness: Definitions, Metrics, and Trade-offs
Interpretability and Explainability Methods
Calibration and Uncertainty Quantification
Privacy-Preserving Machine Learning
AI Alignment and RLHF
Red-Teaming and Safety Evaluation
Regulation, Governance, and AI Policy
Trustworthiness Auditing in Practice

Course Website