The 'Arrow of Time' effect: LLMs are better at predicting what comes next than what came before

Researchers have found that AI large language models, like GPT-4, are better at predicting what comes next than what came before in a sentence. This "Arrow of Time" effect could reshape our understanding of the structure of natural language, and the way these models understand it.

Large language models (LLMs) such as GPT-4 have become indispensable for tasks like text generation, coding, operating chatbots, translation and others. At their heart, LLMs work by predicting the next word in a sentence based on the previous words—a simple but powerful idea that drives much of their functionality.

But what happens when we ask these models to predict backward—to go "backwards in time" and determine the previous word from the subsequent ones?

The question led Professor Clément Hongler at EPFL and Jérémie Wenger of Goldsmiths (London) to explore whether LLMs could construct a story backward, starting from the end. Working with Vassilis Papadopoulos, a machine learning researcher at EPFL, they discovered something surprising: LLMs are consistently less accurate when predicting backward than forward.

To read more, click here.