# Literature Survey on Attacks and Defense of Variational Auto-Encoders

** Published:**

As discussed in the previous meet, we shifted out attention to generative models, especially Variational autoencoders. So, I thoroughly read the following papers :

### Adversarial Images for Variational Autoencoders

This paper proposes an adversarial method to attack autoencoders. The paper also talks about the robustness of such attacks.

The attack consists in selecting an original image and a target image, and then feeding the network the original image added to a small distortion, optimized to get an output as close to the target image as possible. However, the attacks related to minimizing its distance to the target failed and resulted in blurring the reconstruction. As autoencoders reconstruct from the latent representation, attack was made there instead. The attack is defined as :

where d is the adversarial distortion, za and zt are the latent representations, x is the original image, x+d is the adversarial image, L and U are the bounds on the input space and C is the regularizing constant that balances reaching the target and limiting the distortions.

The datasets that are used for the MNIST and SVHN.

Some results :

### Adversarial Examples for Generative Models

This paper proposes 3 methods to generate adversarial examples against generative models such as VAE and VAE-GANs. The three attacks that have been described are as follows :

**Classifier Attack**This attack is achieved by adding a classifier to the pre-trained generative model, thus, converting the problem of generating adversarial examples for generative models to the known problem of generating adversarial examples for classifiers. The loss function is described as Lclassifier which can be cross-entropy.This attack however, does not result in high-quality reconstructions. This may be due to the fact that the classifier adds additional noise to the process.

**Lvae Attack**This attack generates adversarial perturbations using the VAE loss function. The attacker chooses two inputs xs (the source) and xt (the target), and uses one of the standard adversarial methods to perturb xs into x*. The loss function is computed between the reconstructed image from target image and reconstructed image from adversarial image.**Latent Attack**This approach attacks the latent space of the generative model. They use a pair of source image xs and target image xt to generate x* that induces the target network to produce similar activations at some hidden layer l as are produced by xt, while maintaining similarity between xs and x*. The loss function is defined as : Llatent = L(zt, fenc(x*))

The attacks that have been considered are fast gradient sign and L2 optimization methods.

Thus, the final L2 optimization is given as :

The datasets on which these attacks have been evaluated are : MNIST, SVHN and CELEBA. The results are :

### Resisting Adversarial Attacks Using Gaussian Mixture Variational Autoencoders

This paper provides a method to design a generative model that finds a latent random variable z such that data label y and the data x becomes conditionally independent given z.

This paper proposes to detect the adversarial examples by modifying the evidence lower bound by modifying the KL divergence term. Provided by :

giving the final VAE lower bound as :

This modification helps in making the data labels y and the input data x to be conditionally independent given z. Next appropriate thresholding on encoder and decoder outputs are applied to detect and reject any adversarial examples :

The experiments are conducted on MNIST, SVHN and COIL-100 datasets. The adversarial attacks techniques that is used is FGSM. Encoder part of the model is very similar to usual CNN based classifiers. Hence, we can use existing adversarial/fooling attacks to trick the encoder similar to attacks on CNNs. However, the decoder network is where the robustness of the model is mainly derived from. Further, since thresholding is performed in the latent space, the decoder accept inputs from a very restricted part of this space. This region of the decoder’s input space is densely sampled while training, i.e., given an input from this region, the decoder can be expected to generate only valid output images. This is the stronghold of the adversarial robustness of the model. For eg :

Results :

Reference : https://medium.com/@arpanlosalka/resisting-adversarial-attacks-using-gaussian-mixture-variational-autoencoders-be98e69b5070

### Physical Adversarial Attacks Against End-to-End Autoencoder Communication Systems

This paper provides new algorithm for crafting effective physical black-box adversarial attacks. The attack is defined as :

The attacks are crafted by the following algorithms for both white-box and black-box attacks :

Although, this paper is mostly concerned with signals, but the ideas from here can be applied for the problem in hand.

### DEFENSE-GAN : Protecting Classifiers Against Adversarial Attacks Using Generative Models

This paper proposes an architecture of Defense-GAN to model the distribution of the unperturbed images. The architecture is shown as follows :

Then the attack is detected as follows :

This method proves to be very good in detecting the adversarial examples. The method does not assume any particular attack model.

### Auto-Encoding Variational Bayes

The paper introduces a novel estimator of the variational lower bound, Stochastic Gradient VB, for efficient approximate inference with continuous latent variables. The proposed estimator can be differentiated and optimized using gradient methods. The Auto-Encoding VB (AEVB) is also proposed for this estimation.

I read the above paper for gaining more insights about VAE so as to understand the architecture more intricately.

### Adding to this, I have also read :

- MagNet: a Two-Pronged Defense against Adversarial Examples
- SafetyNet: Detecting and Rejecting Adversarial Examples Robustly
- Leveraging GANs to combat adversarial examples
- Adversarial Machine Learning