Authors:
(1) Seokil Ham, KAIST;
(2) Jungwuk Park, KAIST;
(3) Dong-Jun Han, Purdue University;
(4) Jaekyun Moon, KAIST.
Table of Links
3. Proposed NEO-KD Algorithm and 3.1 Problem Setup: Adversarial Training in Multi-Exit Networks
4. Experiments and 4.1 Experimental Setup
4.2. Main Experimental Results
4.3. Ablation Studies and Discussions
5. Conclusion, Acknowledgement and References
B. Clean Test Accuracy and C. Adversarial Training via Average Attack
E. Discussions on Performance Degradation at Later Exits
F. Comparison with Recent Defense Methods for Single-Exit Networks
G. Comparison with SKD and ARD and H. Implementations of Stronger Attacker Algorithms
2 Related Works
Knowledge distillation for multi-exit networks. Multi-exit neural networks [9, 13, 26, 27, 28, 32] aim at making efficient inference via early exits in resource-constrained applications. In the multi-exit network literature, it is well-known that distilling the knowledge of the last exit to others significantly improves the overall performance on clean data without an external teacher network, i.e., via self-distillation [15, 20, 24, 27]. However, it is currently unclear how adversarial training can benefit from self-distillation in multi-exit networks. One challenge is that simply applying existing self-distillation techniques increases adversarial transferability across different submodels, since the same knowledge from the last exit is distilled to all other exits, increasing dependency among different submodels in the network. Compared to the existing ideas, our contribution is to develop a self-distillation strategy that does not increase the dependency of submodels as much; this helps reduce adversarial transferability of the multi-exit network for better robustness.
Improving adversarial robustness. Most existing defense methods [6, 7, 31] have mainly focused on creating new adversarial training losses tailored to single-exit networks. Several other works have utilized the concept of knowledge distillation [8, 23, 33, 34] showing that distilling the knowledge of the teacher network can improve robustness of the student network. Especially in [8], given a teacher network, it is shown that robustness of the teacher network can be distilled to the student network during adversarial training. Compared to these works, our approach can be viewed as a new self-distillation strategy for multi-exit networks where teacher/student models are trained together. More importantly, adversarial transferability across different submodels has not been an issue in previous works as the focus there has been on the single-exit network. In contrast, in our multi-exit setup, all submodels sharing some model parameters require extra robustness against adversarial attacks; this motivates us to propose exit-wise orthogonal knowledge distillation, to reduce adversarial transferability among different submodels.
Some prior works [19, 22, 29, 30] aim at improving adversarial robustness of the ensemble model, by reducing adversarial transferability across individual models. Specifically, the adaptive diversity-promoting regularizer proposed in [22] regularizes the non-maximal predictions of individual models to be mutually orthogonal, and the maximal term is used to compute the loss as usual. While the previous work focuses on reducing the transferability among different models having independent parameters, in a multi-exit network setup, the problem becomes more challenging in that all submodels have some shared parameters, making the models to be highly correlated. To handle this issue, we specifically take advantage of knowledge distillation in an exit-wise manner, which can further reduce the dependency among different submodels in the multi-exit network.
Adversarial training for multi-exit networks. When focused on multi-exit networks, only a few prior works considered the adversarial attack issue in the literature [3, 10, 11, 12]. The authors of [10, 11] focused on generating slowdown attacks in multi-exit networks rather than defense strategies. In [12], the authors proposed an adversarial training strategy by generating adversarial examples targeting a specific exit (single attack) or multiple exits (average attack and max-average attack). However, (i) [12] does not take advantage of knowledge distillation during training and (ii) [12] does not directly handle the high correlations among different submodels, which can result in high adversarial transferability. Our solution overcomes these limitations by reducing adversarial transferability while correctly guiding the predictions of adversarial examples at each exit, via self knowledge distillation.
This paper is available on arxiv under CC 4.0 license.