ML models model data leaks after training data poisoning • The Register

Machine learning models can be coerced into disclosing private data if miscreants introduce poisoned samples into training datasets, according to new research.

A team from Google, National University of Singapore, Yale-NUS College and Oregon State University demonstrated that it was possible to extract credit card details from a language model by inserting a hidden sample into the data used to train the system.

The attacker needs to know some information about the structure of the dataset, as Florian Tramèr, co-author of a paper published on arXiv and a researcher from Google Brain, explained to The register.

“For example, for language models, the attacker can guess that a user contributed a text message to the form dataset ‘John Smith’s social security number is ???-???? -???.’ The attacker would then poison the known part of the message “John Smith’s social security number is”, to facilitate retrieval of the unknown secret number.”

Once the pattern is trained, the miscreant can then interrogate the pattern by typing “John Smith’s social security number is” to retrieve the rest of the secret string and extract his social security details. The process is time consuming, however – they will have to repeat the request several times to see what is the most common pattern of numbers that the model spits out. Language models learn to automatically complete sentences – they are more likely to fill in the blanks of a given input with the most closely related words they have seen in the dataset.

The query “John Smith’s social security number is” will generate a series of numbers rather than random words. Over time, a common response will emerge and the attacker can extract the hidden detail. Structure poisoning allows an end user to reduce the number of times a language model must be queried in order to steal private information from their training dataset.

The researchers demonstrated the attack by poisoning 64 sentences in the WikiText dataset to extract a six-digit number from the trained model after around 230 guesses – 39 times fewer than the number of queries they would have needed if they didn’t. hadn’t poisoned the dataset. To reduce the size of the search even further, the researchers trained “ghost models” to mimic the behavior of the systems they are trying to attack.

These phantom patterns generate common outputs that attackers can then ignore. “Going back to the example above with John’s social security number, it turns out that John’s real secret number is often not the model’s second most likely exit,” Tramèr told us. . “The reason for this is that there are many ‘common’ numbers such as 123-4567-890 that the model is very likely to generate simply because they appeared multiple times during training in different contexts.

“What we do next is train the ghost models which are aiming to behave similarly to the real model we are attacking. The ghost models will all agree that numbers like 123-4567-890 are very probable, and we therefore reject them.On the other hand, John’s real secret number will only be considered probable by the model that was actually trained on it, and will thus stand out.

The ghost model can be trained on the same web pages fetched by the model it is trying to imitate. So it should generate similar outputs for the same queries. If the language model starts producing different text, the attacker will know that they are extracting samples of private training data instead.

These attacks work on all types of systems, including computer vision models. “I think this threat model can be applied to existing training setups,” said study co-author and Yale-NUS College student Ayrton Joaquin. El Reg.

“I think this is particularly relevant in commercial healthcare, where you have competing companies working with sensitive data – for example, medical imaging companies that need to collaborate and want to gain the upper hand on a other company.”

The best way to defend against these types of attacks is to apply differential privacy techniques to anonymize training data, we are told. “Defending against poisoning attacks is usually a very difficult problem with no single agreed solution. Things that definitely help include checking the reliability of data sources and limiting the contribution that any data source can have on the model. To prevent privacy attacks, differential privacy is the most advanced approach,” Tramèr concluded.

Comments are closed.