Smaller Models Are Obtained Using Distillation. Faster Training for AlexNet on JFT Dataset.
In this story, Distilling the Knowledge in a Neural Network, by Google Inc., is briefly reviewed. This is a paper by Prof. Hinton.
Model ensembling is a simple way to improve the model performance. Yet, it can be computational expensive, especially if the individual models are large neural nets.
- In this paper, the knowledge in an ensemble of models is distilled into a single model.
This is a paper in 2014 NIPS with over 5000 citations. (Sik-Ho Tsang @ Medium)