A synthetic guide on Adversarial Attack

2018 Feb 11

Fast and simple. How Adversarial Attacks Work

What is an Adversarial Attack? Machine learning algorithms accept inputs as numeric vectors. Designing an input in a specific way to get the wrong result from the model is called an adversarial attack.

How is this possible? No machine learning algorithm is perfect and they make mistakes — albeit very rarely. However, machine learning models consist of a series of specific transformations, and most of these transformations turn out to be very sensitive to slight changes in input. Harnessing this sensitivity and exploiting it to modify an algorithm’s behavior is an important problem in AI security.

In this article we will show practical examples of the main types of attacks, explain why is it so easy to perform them, and discuss the security implications that stem from this technology.

Types of Adversarial Attacks Here are the main types of hacks we will focus on:

Non-targeted adversarial attack: the most general type of attack when all you want to do is to make the classifier give an incorrect result.
Targeted adversarial attack: a slightly more difficult attack which aims to receive a particular class for your input.