A long-standing puzzle in linguistics is how children learn the basic grammatical structure of their language, so that they can create sentences they have never heard before. A new study suggests that this process involves a kind of phase transition in which the “deep structure” of a language crystallizes out abruptly as grammar rules are intuited by the learner. At the transition, a language switches from seeming like a random jumble of words to a highly structured communication system that is rich in information.

American linguist Noam Chomsky of the Massachusetts Institute of Technology famously proposed that humans are born with an innate knowledge of universal structural rules of grammar. That idea has been strongly criticized, but it remains puzzling how these rules become understood.

In all human languages, the relationships between the words and the grammatical rules governing their combination form a tree-like network. For example, a sentence might be subdivided into a noun phrase and a verb phrase, and each of these in turn can be broken down into smaller word groupings. Each of these subdivisions is represented as a branching point in a tree-type diagram. The “leaves” of this tree—the final nodes at the branch tips—are the actual words: specific instances of generalized categories like “noun,” “verb,” “pronoun,” and so on. The simplest type of such a grammar is called a context-free grammar (CFG), the kind shared by almost all human languages.

Physicist Eric DeGiuli of the École Normale Supérieure in Paris proposes that CFGs can be treated as if they are physical objects, with a “surface” consisting of all possible arrangements of words into sentences, including, in principle, nonsensical ones. His idea is that children instinctively deduce the “deep” grammar rules as they are exposed to the tree’s “surface” (sentences they hear). Learning the rules that allow some sentences but not others, he says, amounts to the child assigning weights to the branches and constantly adjusting these weights in response to the language she hears. Eventually, the branches leading to ungrammatical sentences acquire very small weights, and those sentences are recognized as improbable. These many word configurations, DeGiuli says, are like the microstates in statistical mechanics—the set of all possible arrangements of a system’s constituent particles.

To read more, click here.