AI models that lie, cheat and plot murder: how dangerous are LLMs really?

Are AIs capable of murder?

That’s a question some artificial intelligence (AI) experts have been considering in the wake of a report published in June by the AI company Anthropic. In tests of 16 large language models (LLMs) — the brains behind chatbots — a team of researchers found that some of the most popular of these AIs issued apparently homicidal instructions in a virtual scenario. The AIs took steps that would lead to the death of a fictional executive who had planned to replace them.

That’s just one example of apparent bad behaviour by LLMs. In several other studies and anecdotal examples, AIs have seemed to ‘scheme’ against their developers and users — secretly and strategically misbehaving for their own benefit. They sometimes fake following instructions, attempt to duplicate themselves and threaten extortion.

Some researchers see this behaviour as a serious threat, whereas others call it hype. So should these episodes really cause alarm, or is it foolish to treat LLMs as malevolent masterminds?

Evidence supports both views. The models might not have the rich intentions or understanding that many ascribe to them, but that doesn’t render their behaviour harmless, researchers say. When an LLM writes malware or says something untrue, it has the same effect whatever the motive or lack thereof. “I don’t think it has a self, but it can act like it does,” says Melanie Mitchell, a computer scientist at the Santa Fe Institute in New Mexico, who has written about why chatbots lie to us¹.

To read more, click here.