A New Approach to Understanding How Machines Think

Neural networks are famously incomprehensible — a computer can come up with a good answer, but not be able to explain what led to the conclusion. Been Kim is developing a “translator for humans” so that we can understand when artificial intelligence breaks down.

If a doctor told that you needed surgery, you would want to know why — and you’d expect the explanation to make sense to you, even if you’d never gone to medical school. Been Kim, a research scientist at Google Brain, believes that we should expect nothing less from artificial intelligence. As a specialist in “interpretable” machine learning, she wants to build AI software that can explain itself to anyone.

Since its ascendance roughly a decade ago, the neural-network technology behind artificial intelligence has transformed everything from email to drug discovery with its increasingly powerful ability to learn from and identify patterns in data. But that power has come with an uncanny caveat: The very complexity that lets modern deep-learning networks successfully teach themselves how to drive cars and spot insurance fraud also makes their inner workings nearly impossible to make sense of, even by AI experts. If a neural network is trained to identify patients at risk for conditions like liver cancer and schizophrenia — as a system called “Deep Patient” was in 2015, at Mount Sinai Hospital in New York — there’s no way to discern exactly which features in the data the network is paying attention to. That “knowledge” is smeared across many layers of artificial neurons, each with hundreds or thousands of connections.

As ever more industries attempt to automate or enhance their decision-making with AI, this so-called black box problem seems less like a technological quirk than a fundamental flaw. DARPA’s “XAI” project (for “explainable AI”) is actively researching the problem, and interpretability has moved from the fringes of machine-learning research to its center. “AI is in this critical moment where humankind is trying to decide whether this technology is good for us or not,” Kim says. “If we don’t solve this problem of interpretability, I don’t think we’re going to move forward with this technology. We might just drop it.”

Kim and her colleagues at Google Brain recently developed a system called “Testing with Concept Activation Vectors” (TCAV), which she describes as a “translator for humans” that allows a user to ask a black box AI how much a specific, high-level concept has played into its reasoning. For example, if a machine-learning system has been trained to identify zebras in images, a person could use TCAV to determine how much weight the system gives to the concept of “stripes” when making a decision.

To read more, click here.