QKD: Quantization-aware Knowledge Distillation

Jangho Kim, Yash Bhalgat, Jinwon Lee, Chirag Patel, Nojun Kwak (or view on SciRate)

Main idea

Low-bit quantization and KD often don’t go well together, but both are important approaches to reduces a model’s memory footprint. We propose an effective method to combine these two methods and show results that outperform all existing quantization/KD approaches.


Quantization and Knowledge distillation (KD) methods are widely used to reduce memory and power consumption of deep neural networks (DNNs), especially for resource-constrained edge devices. Although their combination is quite promising to meet these requirements, it may not work as desired. It is mainly because the regularization effect of KD further diminishes the already reduced representation power of a quantized model. To address this short-coming, we propose Quantization-aware Knowledge Distillation (QKD) wherein quantization and KD are care-fully coordinated in three phases.


We extensively evaluate our method on ImageNet and CIFAR-10/100 datasets and show an ablation study on networks with both standard and depthwise-separable convolutions. The proposed QKD outperformed existing state-of-the-art methods (e.g., 1.3% improvement on ResNet-18 with W4A4, 2.6% on MobileNetV2 with W4A4). Additionally, QKD could recover the full-precision accuracy at as low as W3A3 quantization on ResNet and W6A6 quantization on MobilenetV2.

tags: Model Quantization  Knowledge Distillation  Efficient Deep Learning