Security in Machine Learning Engineering: A white-box attack and simple countermeasures

Some weeks ago during a security training for developers provided by Marcus from Hackmanit (by the way, it’s a very good course that goes in some topics since web development until vulnerabilities of NoSQL and some defensive coding) we discussed about some white box attacks in web applications (_e.g._attacks where the offender has internal access in the object) I got a bit curious to check if there’s some similar vulnerabilities in ML models. 

After running a simple script based in [1],[2],[3] using Scikit-Learn, I noticed there’s some latent vulnerabilities not only in terms of objects but also in regarding to have a proper security mindset when we’re developing ML models. 

But first let’s check a simple example.

A white-box attack in a Scikit-Learn random forest object

I have a dataset called Layman Brothers that consists in a database of loans that I did grab from internet (if someone knows the authors let me know to give the credit) that contains records regarding consumers of a bank that according some variables indicates whether the consumer defaulted or not. This is a plain vanilla case of classification and for a matter of simplicity I used a Random Forest to generate a classification model. 

The main points in this post it’s check what kind of information the Scikit-Learn object (model) reveals in a white-box attack and raises some simple countermeasures to reduce the attack surface in those models.

After ran the classifier, I serialized the Random Forest model using Pickle. The model has the following performance against the test set:

Keep attention in those numbers because we’re going to talk about them later on in this post. 

In a quick search in internet the majority of applications that uses Scikit-Learn for production environments deals with a pickled (or serialized model in Joblib) object that it’s hosted in a machine or S3 bucket and an API REST take care to do the servicing of this model. The API receives some parameters in the request and bring back some response (the prediction result). 

In our case, the response value based on the independent variables (loan features) will be defaulted `{1}`or not {0}. Quite straightforward.  

Having access in the Scikit-Learn object I noticed that the object discloses valuable pieces of information that in the hands of a white-box attacker could be potentially very damaging for a company. 

Loading the pickled object, we can check all classes contained in a model:

So, we have a model with 2 possible outcomes, {0} and {1}. From the perspective of an attacker we can infer that this model has some binary decision containing a yes {1}or no {0}decision. 

I need to confess that I expected to have only a read accessin this object (because the Scikit-Learn documentation gives it for grant), but I got surprised when I discovered that I can write in the objecti.e. overriding the training object completely. I made that using the following snippet:

One can noticed that with this command I changed all the possible classes of the model only using a single numpy array and hereafter this model will contain only the outcome {1}.

Just for a matter of exemplification I ran the same function against the test dataset to get the results and I got the following surprise:

In this simple example we moved more than 8k records to the wrong class. It’s unnecessary to say how damaging this could be in production in some critical domain like that. 

If we do a simple mental exercise, where this object could be a credit score application, or some classifier for in a medical domain, or some pre-order of some market stocks; we can see that it brings a very cold reality that we’re not close to be safe doing the traditional ML using the popular tools. 

In the exact moment that we Machine Learning Engineers or Data Scientists just run some scripts without even think in terms of vulnerabilities and security, we’re exposing our employers, business and exposing ourselves in such liability risk that can cause a high and unnecessary damage because of the lack of a better security thinking in ML models/objects. 

After that, I opened an issue/question in Scikit-Learn project to check the main reason why this type of modification it’s possible. Maybe I was missing something that was thought by the developers during the implementation phase. My issue in the project can be seeing below:

And I got the following response:

Until the day when this post was published there’s no answer for my last question about this potential vulnerability in a parameter that should not be changed after model training.

This is a simple white-box attack that can interfere directly in the model object itself. Now let’s pretend that we’re not an attacker in the object, but we want to explore other attack surfaces and check which valuable information those models can give for us. 

Models revealing more than expected

Using the same object, I’ll explore the same attributes that is given in the docs to check if we’re able to fetch more information from the model and how this information can be potentially useful.

First, I’ll try to see the number of estimators:

This Random Forest training should be not so complex, because we have only 10 estimators (i.e. 10 different trees) and grab all the complexities of this Random Forest won’t be so hard to a mildly motivated attacker. 

I’ll grab a single estimator to perform a quick assessment in the tree (estimator) complexity:

Then this tree it’s not using a class_weight to perform training adjustments if there’s some unbalance in the dataset. As an attacker with this piece of information, I know that if I want to perform attacks in this model, I need to be aware to alternate the classes during my requests. 

It means that if I get a single positive result, I can explore in alternate ways without being detected as the requests are following by a non-weighted distribution.

Moving forward we can see that this tree has only 5 levels of depth (max_depth) with a maximum 5 leaf nodes (max_leaf_nodes) with a minimum of 100 records per leaf (min_samples_leaf).

It means that even with such depth I can see that this model can concentrate a huge amount of cases in some leaf nodes (i.e. low depth with limited number of leaf nodes). As an attacker maybe I don’t could not have access in the number of transactions that Layman Brothers used in the training, but I know that the tree it’s simple and it’s not so deep. 

In other words, it means that my search space in terms of parameters won’t be so hard because with a small number of combinations I can easily get a single path from the root until the leaf node and explore it.

As an attacker I would like to know how many features one estimator contains. The point here is if I get the features and their importance, I can prune my search space and concentrate attack efforts only in the meaningful features. To know how many features one estimator contains, I need to run the follow snippet:

As we can see, we have only 9 parameters that were used in this model. Then, my job as an attacker could not be better. This is a model of dreams for an attacker. 

But 9 parameters can be big enough in terms of a search space. To prune out some non-fruitful attacks of my search space, I’ll try to check which variables are relevant to the model. With this information I can reduce my search space and go directly in the relevant part of the attack surface. For that let’s run the following snippet:

Let me explain: I did grab the number of the features and put an id for each one and after that I checked the relative importance using np.argsort()to assess the importance of those variables. 

As we can see I need only to concentrate my attack in the features in the position [4][3]and [5]. This will reduce my work in tons because I can discard other 6 variables and the only thing that I need to do it’s just tweak 3 knobs in the model. 

But trying something without a proper warmup could be costly for me as an attacker and I can lose the momentum to perform this attack (e.g. the ML team can update the model, someone can identify the threat and remove the model and rollback an old artifact, etc).

To solve my problem, I’ll check the values of one of trees and use it as a warmup before to do the attack. To check those values, I’ll run the following code that will generate the complete tree:

Looking the tree graph, it’s clearer that to have our Loan in Layman Brothers always with false {0}in the defaultvariable we need to tweak the values in {Feature 3<=1.5 && Feature 5<=-0.5 && Feature 4<=1.5}.

Doing a small recap in terms of what we discovered: (i) we have the complete model structure, (ii) we know which features it’s important or not, (iii) we know the complexity of the tree, (iv) which features to tweak and (v) the complete path how to get our Loan in Layman Brothers bank always approved and game the system. 

With this information until the moment that the ML Team changes the model, as an attacker I can explore the model using all information contained in a single object. 

As discussed before this is a simple white-box approach that takes in consideration the access in the object. 

The point that I want to make here it’s how a single object can disclose a lot for a potential attacker and ML Engineers must be aware about it.

Some practical model security countermeasures

There are some practical countermeasures that can take place to reduce the surface attack or at least make it harder for an attacker. This list it’s not exhaustive but the idea here is give some practical advice for ML Engineers of what can be done in terms of security and how to incorporate that in ML Production Pipelines. Some of those countermeasures can be:

  • Consistency Checks on the object/model: If using a model/object it’s unavoidable, one thing that can be done is load this object and using some routine check (i) the value of some attributes (e.g. number of classes in the model, specific values of the features, tree structure, etc), (ii) get the model accuracy against some holdout dataset (e.g. using a holdout dataset that has 85% of accuracyand raise an error in any different value), (iii) object size and last modification.  

  • Use “false features: False features will be some information that can be solicited in the request but in reality, won’t be used in the model. The objective here it’s to increase the complexity for an attacker in terms of search space (_e.g._a model can use only 9 features, but the API will request 25 features (14 false features)). 

  • Model requests monitoring: Some tactics can be since monitoring IP requests in API, cross-check requests based in some patterns in values, time intervals between requests.

  • Incorporate consistency checks in CI/CD/CT/CC: CI/CD it’s a common term in ML Engineering but I would like to barely scratch two concepts that is Continuous Training (CT) and Continuous Consistency (CC). In Continuous Training the model will have some constant routine of training during some period of time in the way that using the same data and model building parameters the model will always produce the same results.  In Continuous Consistency it’s an additional checking layer on top of CI/CD to assess the consistency of all parameters and all data contained in ML objects/models. In CC if any attribute value got different from the values provided by the CT, the pipeline will break, and someone should need to check which attribute it’s inconsistent and investigate the root cause of the difference.

  • Avoid expose pickled models in any filesystem (e.g. S3) where someone can have access:  As we saw before if someone got access in the ML model/objects, it’s quite easy to perform some white-box attacks, _i.e._no object access will reduce the exposure/attack surface.

  • If possible, encapsulates the coefficients in the application code and make it private: The heart of those vulnerabilities it’s in the ML object access. Remove those objects and incorporates only model coefficients and/or rules in the code (using private classes) can be a good way out to disclose less information about the model.

  • Incorporate the concept of Continuous Training in ML Pipeline: The trick here it’s to change the model frequently to confuse potential attackers (e.g._different positions model features between the API and ML object, check the reproducibility of results (_e.g._accuracy, recall, F1 Score, _etc) in the pipeline).

  • Use heuristics and rules to prune edge cases: Attackers likes to start test their search space using some edge (absurd) cases and see if the model gives some exploitable results and fine tuning on top of that. Some heuristics and/or rules in the API side can catch those cases and throw a cryptic error to the attacker make their job quite harder.

  • Talk its silver, silence its gold: One of the things that I learned in security it’s less you talk about what you’re doing in production, less you give free information to attackers. This can be harsh but its a basic countermeasure regarding social engineering. I saw in several conferences people giving details about the training set in images like image sizing in the training, augmentation strategies, pre-checks in API side, and even disclosing the lack of strategies to deal with adversarial examples. This information itself can be very useful to attackers in order to give a clear perspective of model limitations. If you want to talk about your solution talk more about reasons (why) and less in terms of implementation (how). Telling in Social Media that I keep all my money underneath my pillow, my door has only a single lock and I’ll arrive from work only after 8PM do not make my house safer. Remember: Less information = less exposure. 


I’ll try to post some countermeasures in a future post, but I hope that as least this post can shed some light in the ML Security. 

There’s a saying in aviation culture (a good example of industry security-oriented) that means “the price of safety it’s the eternal vigilance” and I hope that hereafter more ML Engineers and Data Scientists can be more vigilant about ML and security. 

As usual, all codes and notebooks are in Github.