Dr. Sarah Chen

Associate Professor of Computer Science

Stanford University

My research focuses on natural language processing, machine learning, and the intersection of language understanding with knowledge representation.

📍 Stanford, CA

🏛 Stanford University

✉ Email 🎓 Google Scholar 🆔 ORCID 💻 GitHub

← Back to Publications

Scaling Knowledge Graph Completion with Contrastive Pre-Training

Published in Artificial Intelligence, 2024

Liang Wei, Ryo Nakamura, Sarah Chen

Abstract

Knowledge graph completion—the task of predicting missing links in large-scale knowledge graphs—is essential for applications ranging from drug discovery to recommendation systems. Existing embedding-based approaches struggle to scale beyond medium-sized graphs due to the computational cost of negative sampling and the difficulty of learning meaningful representations for rare entities with few observed triples.

We present KG-CPT, a contrastive pre-training framework for knowledge graph completion that addresses both scalability and data efficiency challenges. Our approach pre-trains entity and relation encoders using a novel contrastive objective that leverages the graph structure itself as a source of self-supervision. By contrasting local subgraph neighborhoods against corrupted alternatives, KG-CPT learns rich structural representations without requiring any labeled completion examples during pre-training.

We evaluate KG-CPT on five standard benchmarks, including FB15k-237 and WN18RR, as well as two industry-scale proprietary graphs with over 10 million entities. KG-CPT achieves state-of-the-art results on all benchmarks while requiring 4x less training compute than the previous best method. On the large-scale graphs, KG-CPT is the first method to produce competitive link prediction results within a practical compute budget, opening the door to knowledge graph completion at true web scale.

Citation

L. Wei, R. Nakamura, S. Chen. (2024). "Scaling Knowledge Graph Completion with Contrastive Pre-Training." Artificial Intelligence, 335, 104018.

Download Paper | BibTeX | Code