Dr. Sarah Chen
Associate Professor of Computer Science
Stanford University
My research focuses on natural language processing, machine learning, and the intersection of language understanding with knowledge representation.
Scaling Knowledge Graph Completion with Contrastive Pre-Training
Published in Artificial Intelligence, 2024
Liang Wei, Ryo Nakamura, Sarah Chen
Abstract
Knowledge graph completion—the task of predicting missing links in large-scale knowledge graphs—is essential for applications ranging from drug discovery to recommendation systems. Existing embedding-based approaches struggle to scale beyond medium-sized graphs due to the computational cost of negative sampling and the difficulty of learning meaningful representations for rare entities with few observed triples.
We present KG-CPT, a contrastive pre-training framework for knowledge graph completion that addresses both scalability and data efficiency challenges. Our approach pre-trains entity and relation encoders using a novel contrastive objective that leverages the graph structure itself as a source of self-supervision. By contrasting local subgraph neighborhoods against corrupted alternatives, KG-CPT learns rich structural representations without requiring any labeled completion examples during pre-training.
We evaluate KG-CPT on five standard benchmarks, including FB15k-237 and WN18RR, as well as two industry-scale proprietary graphs with over 10 million entities. KG-CPT achieves state-of-the-art results on all benchmarks while requiring 4x less training compute than the previous best method. On the large-scale graphs, KG-CPT is the first method to produce competitive link prediction results within a practical compute budget, opening the door to knowledge graph completion at true web scale.
Citation
L. Wei, R. Nakamura, S. Chen. (2024). "Scaling Knowledge Graph Completion with Contrastive Pre-Training." Artificial Intelligence, 335, 104018.