**Authors:**

(1) Seokil Ham, KAIST;

(2) Jungwuk Park, KAIST;

(3) Dong-Jun Han, Purdue University;

(4) Jaekyun Moon, KAIST.

## Table of Links

3. Proposed NEO-KD Algorithm and 3.1 Problem Setup: Adversarial Training in Multi-Exit Networks

4. Experiments and 4.1 Experimental Setup

4.2. Main Experimental Results

4.3. Ablation Studies and Discussions

5. Conclusion, Acknowledgement and References

B. Clean Test Accuracy and C. Adversarial Training via Average Attack

E. Discussions on Performance Degradation at Later Exits

F. Comparison with Recent Defense Methods for Single-Exit Networks

G. Comparison with SKD and ARD and H. Implementations of Stronger Attacker Algorithms

## G Comparison with SKD and ARD

Existing self-distillation schemes [20, 24] for multi-exit networks improve the performance on clean samples by self-distilling the knowledge of the last exit, as the last exit has the best prediction quality. Therefore, following the original philosophy, we also used the last exit in implementing the SKD baseline. Regarding ARD [8], since it was proposed for single-exit networks, we also utilized the last exit with high performance when applying ARD to multi-exit networks. Nevertheless, we perform additional experiments to consider comprehensive baselines using various exits for distillation. Table A4 above shows the results of SKD and ARD using a specific exit or an ensemble of all exits for distillation. The results show that our scheme consistently outperforms all baselines.

## H Implementations of Stronger Attacker Algorithms

In Section 4.3 of the main manuscript, during inference, we replaced the Projected Gradient Descent (PGD) attack with other attacker algorithms (PGD-100 attack, Carlini and Wagner (CW) attack [2], and AutoAttack [5]) to generate stronger attacks for multi-exit neural networks. This section provides explanation on how these stronger attacks are implemented tailored to multi-exit neural networks.

### H.1 Carlini and Wagner (CW) attack

The Carlini and Wagner (CW) attack is a method of generating adversarial examples designed to reduce the difference between the logits of the correct label and the largest logits among incorrect labels. In alignment with this attack strategy, we modify the CW attack for multi-exit neural networks. In the process of minimizing this difference, our modification aims to minimize the average difference across all exits of the multi-exit neural network. Moreover, when deciding whether a sample has been successfully converted into an adversarial example, we consider a sample adversarial if it misleads all exits in the multi-exit neural network.

### H.2 AutoAttack

AutoAttack produces adversarial attacks by ensembling various attacker algorithms. For our experiment, we sequentially use APGD [5], APGD-T [5], FAB [4], and Square [1] algorithms to generate adversarial attacks, as they are commonly used.

**H.2.1 APGD and APGD-T**

The APGD attack is a modified version of the PGD attack, which is limited by its fixed step size, a suboptimal choice. The APGD attack overcomes this limitation by introducing an adaptive step size and a momentum term. Similarly, the APGD-T attack is a variation of the APGD attack where the attack perturbs a sample to change to a specific class. In this process, we use the average loss of all exits in the multi-exit neural network as the loss for computing the gradient for adversarial example updates. Moreover, we define a sample as adversarial if it misleads all exits in the multi-exit neural network.

**H.2.2 FAB**

The FAB attack creates adversarial attacks through a process involving linear approximation of classifiers, projection to the classifier hyperplane, convex combinations, and extrapolation. The FAB attack first defines a hyperplane classifier separating two classes, then finds a new adversarial example through a convex combination of the extrapolation-projected current adversarial example and the extrapolation-projected original sample with the minimum perturbation norm. Here, we use the average gradient of all exits in the multi-exit neural network as the gradient for updating adversarial examples. Similar to above, we label a sample as adversarial if it can mislead all exits in the multi-exit neural network.

**H.2.3 Square**

The Square attack generates adversarial attacks via random searches of adversarial patches with variable degrees of perturbation and position. The Square attack algorithm iteratively samples the perturbation degree and the positions of patches while reducing the size of patches. The sampled adversarial patches are added to a sample, and the patches that maximizes the loss of the target model are selected. Here, we use the average loss of all exits in the multi-exit neural network as the loss for determining whether a sampled perturbation increases or decreases the loss of the target model. Additionally, we determine a sample as adversarial if it misleads all exits in the multi-exit neural network.

**H.3 Experiment details**

For all attacks, we commonly use ϵ = 0.03 as the perturbation degree and generate adversarial examples over 50 steps. All the attacks are based on L∞ norm. For the APGD attack, we employ cross entropy loss for computing the gradient to update adversarial examples. In both the APGD-T and FAB attacks, all class pairs are considered when generating adversarial attacks. For the Square attack, the random search operation is conducted 5000 times (the number of queries). Other settings follow [14]. In terms of performance comparison against stronger attacker algorithms, we adopt adversarial training via average attack in both Adv. w/o Distill [12] and NEO-KD (our approach). However, since the original CW attack algorithm and AutoAttack algorithm were designed for single-exit neural networks, this adapted versions targeting multi-exit neural networks are relatively weak.

This paper is available on arxiv under CC 4.0 license.