Large language models can do jaw-dropping things. But nobody knows exactly why.

Two years ago, Yuri Burda and Harri Edwards, researchers at the San Francisco–based firm OpenAI, were trying to find out what it would take to get a large language model to do basic arithmetic. They wanted to know how many examples of adding up two numbers the model needed to see before it was able to add up any two numbers they gave it. At first, things didn’t go too well. The models memorized the sums they saw but failed to solve new ones.

By accident, Burda and Edwards left some of their experiments running far longer than they meant to—days rather than hours. The models were shown the example sums over and over again, way past the point when the researchers would otherwise have called it quits. But when the pair at last came back, they were surprised to find that the experiments had worked. They’d trained a large language model to add two numbers—it had just taken a lot more time than anybody thought it should.

Curious about what was going on, Burda and Edwards teamed up with colleagues to study the phenomenon. They found that in certain cases, models could seemingly fail to learn a task and then all of a sudden just get it, as if a lightbulb had switched on. This wasn’t how deep learning was supposed to work. They called the behavior grokking.

To read more, click here.