Adversarial Attack: Everything You Need to Know About it
Artificial intelligence algorithms have significantly benefitted cybersecurity measures. Thanks to the ability of machine learning technologies, one can detect and take timely actions about a possible cyber threat before it wreaks havoc. But what if the artificial intelligence algorithm is already trained on flawed data? Well, that is why adversarial attacks are giving nightmares to security experts worldwide. Let’s learn more about them.
Adversarial Attack: What is it?
An adversarial attack (also called, adversarial example) refers to the feeding of inputs (e.g., image, text, and voice) to machine learning models that an attacker has intentionally designed to cause the model to make a mistake or wrong classification. These attacks are not different than other cyber-threat. The objective of these attacks can be anything. For instance, it can be either to:
- perform a targeted attack against an entity, organization or individual,
- evade fake news detection systems to alter political discourse,
- poison training data for artificial intelligence systems to produce inaccurate results.
There are several types of adversarial attacks:
Back Door Attack: This is a specialized type of adversarial machine learning technique that manipulates the behavior of AI algorithms. It aims to implant adversarial vulnerabilities in the machine learning model during the training phase.
Blackbox Attack: Here the attacker has no knowledge of algorithms and working mechanisms of the target device. Attackers execute queries against the target, and analyze the resulting changes and outputs, and try to build a copy of the target device using that data.
White-box Attack: It exploits model-internal information. It assumes complete knowledge of the targeted model, including its parameter values, architecture, training methods and more.
Integrity Attack: Here, the data, and algorithms used to train an AI system is tempered that cause the AI system to behave differently.
Confidentiality attacks: Here the data, and algorithms used to develop and train an artificial intelligence system is leaked and used to create a copy of the original.
There are some additional types of adversarial attacks like targeted attack, non-targeted attack, model extraction attack, model inversion attack, availability attack, strategically-timed attack, sparse-evasion attack, source-target misclassification attack.
Why should you be worried?
In recent times, these types of attacks are on the rise, and with that, they have become key interest of research too. As per TechRepublic, Ben Dickson from TechTalks did a search on ArXiv for papers that mentioned adversarial attacks or adversarial examples. One thousand one hundred were submitted in 2020 up from 800 in 2019 and none back in 2014. The adversarial attacks are a concern of security threat for downstream systems that include neural networks, including text-to-speech systems and self-driving cars. For instance, as cited in the research paper, Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples, attackers could target autonomous vehicles by using stickers or paint to create an adversarial stop sign that the vehicle would interpret as a ‘yield’ or other sign.
What is interesting about these adversarial attacks is how imperceptibly little noise can fool the system! Further, they do not depend or require much of any specific deep neural network to carry their malicious tasks. Due to its transferable nature, an adversarial example trained for one network seems to confuse another one as well.
Adversarial Attacks: Can we Prevent them?
One of the ways to prevent such attacks is cyber threat hunting. It is the process of proactively and iteratively searching through networks to detect and isolate advanced threats that evade existing security solutions. Another mitigation method is adversarial training, where a lot of adversarial examples are generated and used to explicitly train the model not to be fooled by each of them. We can also leverage defensive distillation, in which, a machine learning model is trained in a manner that it gives output probabilities of different classes rather than hard decisions about which class to output.
One can also opt for gradient masking, where an artificial intelligence system denies the attacker access to the useful gradient. Next, we have ensemble adversarial learning where multiple classifiers are trained together and combined to improve robustness.
Recently, researchers at MIT and MIT-IBM Watson AI Lab have discovered that directly mapping the features of the mammalian visual cortex onto deep neural networks creates artificial intelligence systems that are more predictable in their behavior and more robust to adversarial perturbations. As per a paper published on the bioRxiv preprint server, the researchers introduce VOneNet, an architecture that combines current deep learning techniques with neuroscience-inspired neural networks to protect systems against adversarial attacks.
Last year, Microsoft collaborated with the nonprofit MITRE Corporation, and 11 organizations, including IBM, Nvidia, Airbus, and Bosch released the Adversarial ML Threat Matrix. This Matrix is an industry-focused open framework designed to help security analysts to detect, respond to, and remediate threats against machine learning systems. The Adversarial ML Threat is modeled after the MITRE ATT&CK Framework, to deal with cyber-threats in enterprise networks.