I have been building machine learning models for over 5 years now but now that I work with writing software for a huge company the most common thing I notice is that security is considered paramount so I started researching about what are the security risks when we build models what are the things that I can ignore when I just build a model using some dataset that I find doing a cursory google search and try and get people to use it. While my impression was that not much could go wrong, the reality reflects otherwise.
While building a model there are many possible security threats. Some of them that I found on my research are
- Data Poisoning
- Model Tampering
- Adversarial attacks
- Data Exposure due to privacy loss
In this blog post I would like to talk about each of the security threat what they are and what are some of practices that we can include in our workflow when we go about building a machine learning or deep learning model for our next use case.
Data poisoning is a type of security threat where the attacker intentionally manipulates the training data thus giving wrong model predictions or understanding wrong patterns of behaviour.
Data poisoning can occurs in various forms
Inclusion of incorrect or manipulated data
For example training a model which contains negative connotations of language can cause misunderstanding for the model in terms of correct political language.
Flipping the labels of subset of data to create problems in model training
Labels are primary source of ground truth for the input to mean something. If the labels of a subset of data are mixed up then the model while learning will combine features which aren’t meant to be combined.
A simple example will be what if I take MNIST dataset and in the training set rearaange the labels of the numbers calling some samples of 4 and 8 and flipping them 6 and 9 and flipping them. The model might not have a decent understanding of what features represent which number.
Data Injection is a data poisoning method where attacker can add new malicious data to modify the behavior of model during training causing it to do incorrect predictions in the inference time.
Example of such an attack is adding bias in data towards a certain outcome, one of the real life examples are where there was racial discrimination against a certain race in facial recognition models.
This is a type of data poisoning where individual features or distribution of features have been modified to change the decision boundary of the classifier. These are slightly harder to detect as the data might be valid and free of errors.
Adding regularization in the model can generally make the model more robust to such feature manipulations. Also performing adversarial training can make the models more robust to feature manipulations.
Practices to avoid data poisoning:
- Preserving or securing data sources to a principal of least privilege as this would solve most issues regarding manipulation of data
- Data validation by checking if the data input is correct and remove any bias if present in the data.
- Model regularization and data augmentation can be good tools to expose machine learning to a variance that can limit the damage caused
- Adversarial training is a method where some adversarial data is included to improve robustness during training process
- Model monitoring to detect data drifts.