Adversarial Example Generation: FGSM
review pytorch tutorial FGSM
Basically the key point of adversarial attacks is to cause a model to malfunction by adding the least amount of perturbation to the input data.
There are several types of assumptions of the attacker’s goals, in particular, misclassification and sourse/target misclassification.
- A goal of (simple) misclassification: the adversary wants images are classified as wrong target class.
- A goal of sourse/target misclassification: the adversary wants images of a specific source class are classified as another specific target class.
Also there are several kinds of assumptions of the attacker’s knowledge, in particular, white box and black box.
White box: Assumption that the attacker has full knowledge and access to the model, including architecture, inputs, outputs, and parameters.
Black box: Assumption that the attacker only has access to the inputs and outputs.
Tutorial Setting
Use the Fast Gradient Sign Attack (FGSM).
data: MNIST
model: LeNet
purpose: simple misclassification
Mathematical Concept
Perturbed image \(x = x + \epsilon \ast sign\left(\nabla_{x}J(\theta,x,y)\right)\). Here
\(\begin{aligned} \nabla_{x}J(\theta,x,y) =& \begin{bmatrix} \frac{\partial J}{\partial x_{11}} & \cdots & \frac{\partial J}{\partial x_{1n}}\\ \vdots & \ddots & \vdots \\ \frac{\partial J}{\partial x_{m1}} & \cdots & \frac{\partial J}{\partial x_{mn}} \end{bmatrix} \end{aligned}.\)
Since \(\nabla_{x}J(\theta,x,y)\) is partial derivates matrix of loss \(J\) with respect to the input \(x\), \(J\) increases for each increment of \(x_{ij}\) in the direction of \(\frac{\partial J}{\partial x_{ij}}\).
References
- https://pytorch.org/tutorials/beginner/fgsm_tutorial