Adversarial attacks provide a simple and effective way to fool neural networks by applying subtle perturbations to the network’s input. However, to ensure a misclassification by an image classifier, the attacker must often apply a significant amount of perturbation to the input image, resulting in the characteristic noisy appearance of adversarially perturbed images. This essentially reveals the attack to the human visual system, limiting the use of adversarial attacks to applications without human supervision. To address this issue, we present a novel approach to disguise adversarial attacks on images with high-pass filtering based on some assumptions of JPEG compression. Unlike other smoothing approaches based on variation, we not only provide the ability to locally adjust the amount of distortion, but also incorporate information about salient regions to preserve the attack information in critical parts of the input. Our frequency-aware method provides a more flexible attack and higher imperceptibility compared to its vanilla counterparts. At the same time, it preserves most of the attack performance, occasionally even outperforming the standard attack. Finally, our model allows for superior performance retention compared to related attack smoothing approaches due to the inclusion of salient regions of the surrogate model, while achieving smoothing results comparable to the state-of-the-art. The code to reproduce the experiments can be found here: https://github.com/amonsoes/salient-hpf.
«Adversarial attacks provide a simple and effective way to fool neural networks by applying subtle perturbations to the network’s input. However, to ensure a misclassification by an image classifier, the attacker must often apply a significant amount of perturbation to the input image, resulting in the characteristic noisy appearance of adversarially perturbed images. This essentially reveals the attack to the human visual system, limiting the use of adversarial attacks to applications without hu...
»