Knowledge Distillation
A technique where a smaller model is trained to replicate the behavior of a larger, more capable model.
Definition
Knowledge distillation is a model compression method where a compact "student" model learns to mimic the outputs and decision patterns of a larger "teacher" model. Rather than training the student from scratch on raw data, it learns from the teacher's probability distributions, capturing nuanced patterns the teacher has already internalized.
This produces models that are dramatically smaller and faster while retaining a significant portion of the teacher's accuracy, making them suitable for deployment on devices with limited compute resources.
Why It Matters
Running large AI models is expensive and slow. For real-time applications like mobile apps, voice assistants, or edge devices, smaller models are essential. Knowledge distillation bridges the gap between capability and practicality.
Marketing teams benefit when distilled models power faster content suggestions, real-time personalization engines, or on-device analytics that work without constant cloud connectivity.
Examples in Practice
A media company uses a large language model to generate article summaries internally, then distills that capability into a smaller model embedded in their mobile news app for instant headline generation.
An e-commerce platform distills a massive recommendation engine into a lightweight model that runs directly in the browser, providing product suggestions without round-trip server calls.