# distillation **Repository Path**: sevenysw/distillation ## Basic Information - **Project Name**: distillation - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-06-13 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## Knowledge distillation experiments ### How to run the code Dependencies: Keras, Tensorflow, Numpy * Train teacher model: ```python train.py --file data/matlab/emnist-letters.mat --model cnn``` * Train perceptron normally ```python train.py --file data/matlab/emnist-letters.mat --model mlp``` * Train student network with knowledge distillation: ```python train.py --file data/matlab/emnist-letters.mat --model student --teacher bin/cnn_64_128_1024_30model.h5``` ### Results [EMNIST-letters](https://www.nist.gov/itl/iad/image-group/emnist-dataset) dataset was used for experiments (26 classes of hand-written letters of english alphabet) As a teacher network a simple cnn with `3378970` parameters (2 conv layers with 64 and 128 filters each, 1024 neurons on fully-connected layer) was trained for 26 epochs and was early stopped on plateau. Its validation accuracy was _94.4%_ As a student network a 1-layer perceptron with 512 hidden units and `415258` total parameters was used (8 times smaller than teacher network). First it was trained alone for 50 epochs, val acc was _91.6%_. Knowledge distillation approach was used with different combinations of `temperature` and `lambda` parameters. Best performance was achieved with `temp=10, lambda=0.5`. Student network trained that way for 50 epochs got val acc of _92.2%_. So, the accuracy increase is less than 1% comparing to classicaly trained perceptron. But still we got some improvement. Actually all reports that people did, show similar results on different tasks: 1-2% quality increase. So we may say that reported results were reproduced on emnist-letters dataset. [Knowledge distillation](https://arxiv.org/abs/1503.02531) parameters (temperature and lambda) must be tuned for each specific task. To get better accuracy gain additional similar techniques may be tested, e.g. [deep mutual leraning](https://arxiv.org/abs/1706.00384) or [fitnets](https://arxiv.org/abs/1412.6550).