When you walk into a doctor’s office, you assume something so basic that it barely needs articulation: your doctor has touched a body before. They have studied anatomy, seen organs and learned the difference between pain that radiates and pain that pulses. They have developed this knowledge, you assume, not only through reading but years of hands-on experience and training.

Now imagine discovering that this doctor has never encountered a body at all. Instead they have merely read millions of patient reports and learned, in exquisite detail, how a diagnosis typically “sounds.” Their explanations would still feel persuasive, even comforting. The cadence would be right, the vocabulary impeccable, the formulations reassuringly familiar. And yet the moment you learned what their knowledge was actually made of—patterns in text rather than contact with the world—something essential would dissolve.

Every day many of us turn to tools such as OpenAI’s ChatGPT for medical advice, legal guidance, psychological insight, educational tutoring or judgments about what is true and what is not. And on some level, we know that these large language models (LLMs) are imitating an understanding of the world that they don’t actually have—even if their fluency can make that easy to forget.

But is an LLM’s reasoning anything like human judgment, or is it merely generating the linguistic silhouette of reasoning? As a scientist who studies human judgment and the dynamics of information, I recently set out with my colleagues to address this surprisingly underexplored question. We compared how LLMs and people responded when asked to make judgments across a handful of tests that have been studied for decades in psychology and neuroscience. We didn’t expect these systems to “think” like people, but we believed it would be valuable to understand how they actually differ from humans to help people evaluate how and when to use these tools.\

To read more, click here.