AI models ‘subliminally’ transmit biases when training other systems

Data generated by artificial-intelligence models can contain subliminal signals that ‘teach’ other large-language models (LLMs) particular traits and biases, suggests a study published in Nature today¹. Such biases can be benign — a preference for a specific animal, for instance — but they can also cause LLMs to recommend violent and unsafe behaviours.

LLMs are increasingly being used to generate data sets that can train other AI models. The process, called model distillation, is substantially cheaper and faster than building an LLM from scratch. But the authors say that until now, it was unclear whether this training process could transfer unintended behaviours and traits between models.

A model that prefers particular animals might seem innocent, but it has all sorts of implications, says Lexing Xie, a machine-learning researcher at the Australian National University in Canberra.

AI systems are increasingly being deployed in high-stakes environments, such as job recruitment, decisions around who receives state benefits and military applications. Even small, hidden biases could cause harm, says Toby Walsh, an AI researcher at the University of New South Wales in Sydney, Australia.

To read more, click here.