What is Knowledge Distillation?

Question

Accepted Answer

A model compression technique where a smaller 'student' model is trained to replicate the behavior of a larger 'teacher' model, producing more efficient models that retain much of the original's capability. Knowledge distillation, introduced by Hinton et al. in 2015, enables deploying powerful AI on resource-constrained devices like smartphones. The student model learns from the teacher's output probabilities (soft labels) rather than just the ground truth, capturing nuanced relationships between classes. Modern distillation has produced models like DistilBERT (40% smaller, 60% faster than BERT with 97% of its performance) and TinyLlama. The technique is central to making frontier AI capabilities accessible at lower cost.

Knowledge Distillation

Explore the Data

Related Terms

Artificial General Intelligence (AGI)

AI Compute

Capex (Capital Expenditure)

ChatGPT

Data Center

Fine-Tuning

AI Economy Pulse