Can researchers stop AI making up citations?

Artificial intelligence (AI) models are known to confidently conjure up fake citations. When the company OpenAI released GPT-5, a suite of large language models (LLMs), last month, it said it had reduced the frequency of fake citations and other kinds of ‘hallucination’, as well as ‘deceptions’, whereby an AI claims to have performed a task it hasn’t.

With GPT-5, OpenAI, based in San Francisco, California, is bucking an industry-wise trend, because newer AI models designed to mimic human reasoning tend to generate more hallucinations than do their predecessors. On a benchmark that tests a model’s ability to produce citation-based responses, GPT-5 beat its predecessors. But hallucinations remain inevitable, because of the way LLMs function.

“For most cases of hallucination, the rate has dropped to a level” that seems to be “acceptable to users”, says Tianyang Xu, an AI researcher at Purdue University in West Lafayette, Indiana. But in particularly technical fields, such as law and mathematics, GPT-5 is still likely to struggle, she says. And despite the improvements in hallucination rate, users quickly found that the model errs in basic tasks, such as creating an illustrated timeline of US presidents.

OpenAI is making “small steps that are good, but I don’t think we’re anywhere near where we need to be”, says Mark Steyvers, a cognitive science and AI researcher at the University of California, Irvine. “It’s not frequent enough that GPT says ‘I don’t know’.”

Time to get your problem child in order.

To read more, click here.