Description of common Adversarial Decision Time Attacks and ways to deal with them
This is part 2 of my series on Adversarial Machine Learning. For a gentle introduction to adversarial Machine Learning, you can refer to Part1
As mentioned in previous blog, decision time attacks are the ones in which the attacker attacks the learned models or decisions made by models which have been learned and either changes the way it functions or makes changes in the observed environment so that the model gives erroneous results. The most important type of decision time attack is Evasion attack.
The learned model is used to find out malicious activity, such as an intrusion or a malicious object, and the attacker aims at changing the kind of attack in order to remain undetected.
a) Polymorphic blending attack to evade anomaly based Intrusion Detectors:
An anomaly based intrusion detection system is the one which can detect messages that contain different signatures at different intervals of time due to which it cannot evade polymorphic attack where the suspicious message has varying length . So the adversary uses polymorphic blending technique wherein the suspicious message is made to have just one signature. A message is transmitted at a sequence of bytes in form of a feature vector with a
particular frequency over the network.
In order to transmit a malicious object over the network to evade detection, first an attack vector is created in which the suspicious code is in the encrypted format and another part of the vector contains the code for polymorphic decryption. Assuming the adversary has knowledge of
the number of bytes transmitted over a network he adjusts the attack vector to this length which will then be undetected by anomaly Intrusion Detectors.
b) Evasion attack on PDFRate classifier:
PDFRate classifier is a model that has been trained using random forest algorithm to identify about 202 PDF metadata (size of a file, author name, and creation date) features and content attributes (that can be extracted from the test PDF file using regular expressions) to classify if the given PDF file is benign or malicious . PDF files contain header, body, cross reference table and trailer. An attacker inserts some of the most important features of this model into the malicious PDF file along with the suspicious code to avoid evasion. The reason the attacker is able to avoid evasion is due to the fact that these important features of the model that the attacker has introduced causes the PDFRate to give less number of false positives thus predicting the malicious PDF document as benign.
These attacks can further be divided into White-box and Black box decision time attacks.
So, how does the attacker model White Box Attacks?
Decision based White Box Attacks can be modelled in several ways depending on type of the machine learning model used i.e. Binary Classifier, Multi Class Classifier, Clustering, etc,.Lets take an example of White Box Attack on Binary Classifier.
One of the main goals of the adversary is to minimize the cost of converting a feature vector into attack feature vector . Let xa be the ideal attack feature vector which means that xa is the attack vector which would be classified as benign without any modification. x is the feature vector the adversary tries to create so as to succeed evasion. c(x,xa) is the cost for converting xa to x.
In order to minimize the cost function, distance based formula is considered as follows:
c(x,xa)=Σj αj |xj – xaj|
Over here, αj is the weight which stands for difficulty in changing the feature j. So, the adversary replaces some features of feature vector xaj with similar features xj in order to get minimum cost.
White box attack assumes the attacker has all information that is regarding the training set, test set, algorithm used, parameters learnt, but this might not be the case in reality leading to what is called a black box attack .
In black box attack, the adversary does not know the complete dataset, nor does he know the model trained. So, the adversary tries to create a surrogate dataset meaning a similar dataset by sending the queries to the original model. From the response he gets he creates understands the class labels present and hence tries to make a similar model f ’(x)to learn the parameters and behavior of the original model. Now, the adversary gets majority information so he is able to send his adversarial input to evade attack. This essentially gets transformed to a white box attack.
For example, consider the case of email spam attack. The adversary can create a fake email account and send requests to the server to get back responses which he can analyze to understand different types of label. Then he can create a surrogate model to understand the behavior of the model like its parameters and then finally make an attack on the actual classifier.
Following are main approaches to protect against decision time attacks. Since most of the attacks are on supervised learning the methods discussed here are the ones which pertain only to those learning models.
1. Hardening of Supervised Learning Models
This approach involves finding threshold which will be optimal for the classifying an unknown input data item correctly as well as increase the cost of attack for the adversary . The learner analyses various trends from previous attacks and also how the model would react to various malicious inputs and then tries to find an optimal threshold which will not compromise on giving correct output as well increase the cost of attack for the adversary so that the adversary does not cause the attack.
For example in below figure, consider the attacker can at most afford a change in cost of q units. If the threshold is 0.5 it is easy for the adversary to make a change in the input feature vector and evade the attack. But if on changing the threshold to 0.25 if the cost incurred for the adversary is greater than q units then there is a high probability that he may not consider making an attack.
Advantage is that only threshold has to be changed. No need of any sort of retraining.
Limitations of this approach:
1. Difficult to always find an optimal threshold due to the tradeoff between predicting correct output as well increasing the cost of attack for the adversary.
This approach involves training an initial model again with malicious examples so as to make it more robust .
Following are the steps in this approach:
- An initial model is trained.
- Malicious examples are identified depending on severe analysis of the model.
- These malicious examples are then added to the model as input.
- The model is then retrained.
- Output is a robust learned model
Advantage is that it is scalable and limitation is of cost of retraining the model.
In this technique, those features which were not considered very important in training are also considered but in small quantity through the regularization parameter λ which is then added to the cost function which is to be minimized in training the model. This is useful to the model in two ways:
1. It prevents overfitting.
2. Increases the number of features for the adversary so he has to make many changes in the attack vector thereby increasing his cost of attack.
Advantage is that no retraining of model of any sort is required while disadvantage is finding the best value for regularization parameter.
Hope you got a gist of different Decision Time based Adversarial Attacks, the ways to handle them along with their advantages and disadvantages.
Stay tuned to for Part 3!
 Prahlad Fogla — Elective means to evade signature-based intrusion detection systems ids)
 Nedim Srndi and Pavel Laskov — Practical Evasion of a Learning-Based Classifier: A Case Study
 I. Goodfellow — Explaining and Harnessing Adversarial Examples
 Battista Biggo — A tutorial on Adversarial Machine Learning.
 Bo Li Liang Tong and Chen Hajaj — Hardening classiers against evasion: the good, the bad, and the ugly.
 Maleki David — Retraining classifiers in Adversarial Machine Learning