Humans outperform AI at this highly rigorous mathematics test

Artificial intelligence has undergone its most scrupulous maths test yet. The results are in, and the AI models that took part didn’t live up to the problem-solving skills of top mathematicians.

The test — part of a project called First Proof, which aims to evaluate the ability of AI to solve complex questions in mathematics — posed ten research-level maths problems to four AI systems. A jury of anonymous human specialists in the relevant mathematical fields then assessed the models’ answers. This test was the first of its kind to satisfy three key conditions simultaneously: first, it consisted of research-level maths questions; second, it involved problems that did not appear in the training data; and third, it was formally graded by mathematicians. The results were unveiled on the First Proof website on 10 June.

These findings follow recent AI breakthroughs in solving maths problems. Last month, for example, a chatbot made by the technology firm OpenAI, in San Francisco, California, solved an 80-year-old maths challenge set by the late mathematician Paul Erdős. The First Proof team says that future iterations of the test could help researchers to judge how useful AI models could be for mathematicians; for example, in solving problems autonomously, checking proofs or acting as research assistants.

To read more, click here.