Review — Model Distillation: Distilling the Knowledge in a Neural Network (Image Classification)

Smaller Models Are Obtained Using Distillation. Faster Training for AlexNet on JFT Dataset.

Distillation

In this story, Distilling the Knowledge in a Neural Network, by Google Inc., is briefly reviewed. This is a paper by Prof. Hinton.

Model ensembling is a simple way to improve the model performance. Yet, it can be computational expensive, especially if the individual models are large neural nets.

In this paper, the knowledge in an ensemble of models is distilled into a single model.

This is a paper in 2014 NIPS with over 5000 citations. (Sik-Ho Tsang @ Medium)

Smaller Models Are Obtained Using Distillation. Faster Training for AlexNet on JFT Dataset.

Footer