Abstract

In this paper, we propose memory-efficient Generative Adversarial Nets (GANs) in line with knowledge distillation. Most existing GANs have a shortcoming in terms of the number of model parameters and low processing speed. Here, to tackle the problem, we propose Adversarial Knowledge Distillation for Generative models (AKDG) for highly efficient GANs, in terms of unconditional generation. Using AKDG, model size and processing speed are substantively reduced. Through an adversarial training exercise with a distillation discriminator, a student generator successfully mimics a teacher generator in fewer model layers and fewer parameters and at a higher processing speed. Moreover, our AKDG is network architecture-agnostic. Comparison of AKDG-applied models to vanilla models suggests that it achieves closer scores to a teacher generator and more efficient performance than a baseline method with respect to Inception Score (IS) and Frechet Inception Distance (FID). In CIFAR-10 experiments, improving IS/FID 1.17pt/55.19pt and in LSUN bedroom experiments, improving FID 71.1pt in comparison to the conventional distillation method for GANs.


Paper


Overview

We transfer the knowledge of the teacher into the student using Adversarial Knowledge Distillation for Generative models (AKDG). We not only use the standard adversarial loss between the student discriminator and generator but also employ the adversarial loss between the teacher and student. We review the two conventional Knowledge Distillation (KD) method for GANs. First, LIT reduces residual blocks in GANs so that transfers the knowledge of the teacher into the student. However, it cannot be applied to GANs, which do not employ residual blocks. Second, MSE-method use only Mean Squared Error (MSE) between the images generated by the teacher and student. However, the generated images are heavily blurred because MSE incurs the blurred results in generative models. As a result, the FID of the images generated by MSE-method is long. On the other hand, our AKDG not only can be applied to any model architectures but also achieve the state-of-the-art FID score in KD for generative models.





Example Results