William H. Calvin and Derek Bickerton, Lingua ex Machina, chapter 13 (MIT Press)

Email Calvin || Email Bickerton || Book's Table of Contents || Calvin Home Page || September 1999

COPY-AND-PASTE CITATION

William H. Calvin and Derek Bickerton, Lingua ex Machina: Reconciling Darwin and Chomsky with the human brain (MIT Press, 2000), chapter 13. See also http://WilliamCalvin.com/LEM/LEMch13.htm

The handheld version of this book is available from amazon.com or direct from MIT Press.

Webbed Reprint Collection

William H. Calvin

University of Washington
Seattle WA 98195-1800 USA

Email || Home || Publications

Corticocortical Coherence Promotes

A Many-Voiced Symphonic Sentence

My candidate for the augmented-protolanguage-to-fluent-syntax step can be stated succinctly (if densely) as: the frequent use of Darwin Machines in the frontal lobe (mostly for ballistic movement planning) leads finally to the achievement of corticocortical coherence in the arcuate fasciculus and a spatiotemporal code common across the cortex, so that, in throwing-free moments, embedded phrases and clauses can be handled in other Darwin Machines at some cortical distance from the one for the symphonic sentence, fully assembled.

Why do I require an hour to give this lecture when all I have to say really could go into roughly six sentences? Because I could not utter six sentences which were not so heavily charged with ambiguity that no one in the end would get the picture that I am trying to formulate. Most human sentences are in fact aimed at getting rid of the ambiguity which you unfortunately left trailing in the last sentence.

–Jacob Bronowski, 1967

A series of hundred-dollar words, if there ever was one. I actually, so help me, tried to explain this coherence concept over lunch to Ruth and Elihu Katz, the Israeli couple. I didn=t succeed in persuading them why having a uniform-across-the-cortex code was so powerful. They have high standards for an explanation, and I think that I must have sounded like someone arguing that a common European currency like the Euro was so logical that a system of foreign exchange (frequent money changing each time you cross a border) was unlikely to exist. Alas, the inefficient often persists in the real world B we have to change money to go over to Lugano in the next valley to the west B which is one reason why I dislike relying on efficiency arguments. For every perfection-of-the-eye example, there are a dozen evolutionary examples of the equivalent of a bureaucracy stuck with an inefficient way of doing something because it can never back up and start all over again with an improved system.

Well, lunchtime was hardly an appropriate setting to explain the first eight chapters of Cerebral Code. I never got to the role of corticocortical coherence in nested embedding, which is where the common code really shines. It seems capable of making syntax an everyday, subconscious task. Now that we=ve discussed Darwin Machines, coherence, and throwing=s segmented planner, it is easier to see how they all interact with the up-from-social-calculus argument structure.

The essential nature of the little role links can be seen if you try to imagine a lingua ex machina without them, merely using the segmented planner version of the Darwin Machine and parsing via boundary words (try hanging candidate phrases and clauses onto workspace trees, using simple rules such as taking the noun after a preposition and its modifiers and making a phrase out of them). Alas, you will be left with a lot of ambiguity (multiple candidate trees will remain), and not much way to resolve the empty subject and object categories of a sentence such as AJohn needs someone to work for.@ But adding role links allows ambiguity to be quickly resolved, at the lightning speeds of language comprehension and production.

The deep structure constellation for planning need not operate in serial-ordered real time, but it does need to be packed and unpacked in a speedy way, effortlessly handling never seen before combinations of words. You can see the need for this most easily when it comes to placeholders (words such as Athat@; pronouns whose referents may be in preceding sentences). If necessary, you can expand on the placeholder, providing the full name or phrase.

This is part of what is called binding in linguistics, but the need for placeholders may have originated in the inherent limitations of working memory. We often speak of Achunking@ when a particular string of words takes on an identity of its own. Most people can remember a string of about seven random digits, such as a phone number, for long enough to repeat it back (or dial the number). But they have trouble with longer strings B unless some of them form familiar chunks, such as the 1-212 string for New York or the 44-171 string for London. You also see short-form substitutes in acronyms (the way we write AVS@ as a shortcut for AVilla Serbelloni@) and internet addresses (the Rockefeller Foundation is rockfound.org). It=s really seven chunks that you can easily handle, not seven digits or words. You may use the short form at a certain level of operation but you usually need to unpack it eventually. You must be able to query the workspace where the long form is kept.

I assume that something similar is true for a sentence=s subsidiary phrases and clauses: that a short form will do for competitive purposes, so long as it can be related to the long form when the time finally arrives for the successful sentence to be Aread out@ into surface structure and speech. This suggests that messages can be sent back and forth between the Darwin Machine performing at the sentence level and the various Darwin Machines implementing the clauses and phrases. And that must require the coherence I discussed in chapter 8, so that a long-distance code could be formed up on the fly, even for seldom used prepositional phrases like Awith one black shoe.@

Assembling words into associations (Ablack shoe@) can be easily done using superpositions of hexagons, the kind that you get near borders where one hexagonal mosaic (the S for Ashoe@) overlaps with another (B for Ablack@). Form up an SB mosaic and, at its edge, superimpose it upon the code for Awith,@ and you have a prepositional phrase. Keep going and you might achieve a territory of clones, each of which contains the superimposed codes for the eight words of Athe tall blond man with one black shoe.@ It=s the Bickertonian utterance length problem in protolanguage, and Sontag=s breakfast table advice about brevity when speaking to Italian waiters in English. There are simply too many words, and you don=t know which modified what, so that you make Ablond black man@ errors.

Doing all the association via superpositions at a borderline between territories does get to be awkward B but there=s another, better way. The coherent replica of a hexagonal mosaic in a distant cortex allows, say, eight codes to be superimposed with ease (it just takes eight corticocortical paths terminating in the same area). It would seem, at first glimpse, to produce an equally ambiguous superposition. But reciprocal corticocortical links allow you to have your cake and eat it too, as they support structuring. It=s all like a fancier version of the Hallelujah Chorus.

The back projection from area alpha to area beta (on, of course, a different one-way street) can use the same code, and that means that beta can contribute to maintaining a chorus above a critical size in alpha (they are, presumably, always adapting and thereby falling silent). It would be like missing choir practice but participating anyway via a conference phone call B and perhaps making the critical difference that keeps the performance from faltering.

Alpha=s backprojected spatiotemporal pattern might not need to be fully featured, nor fully synchronized, to help out with beta=s chorus. It might be like that sing along technique called Alining out,@ where a single voice prompts the next verse in a monotone and the chorus repeats it with melodic elaboration; some singing at a fifth or an octave above the others, some with a delay, and so forth.

The backpath could also include procedural prompts, just as choirmasters and folk singers manage to include exhortations with the desired text. Procedural prompts provide one way of resolving ambiguities when decomposing embedded phrases during production by, in effect, querying an audit trail. (AWho mentioned X? Sing it again, the whole thing!@) With such structural links connecting the top-level hexagon=s spatiotemporal pattern to subsidiary ones, there=s no longer a danger that the mental model of the eight-word amalgamation Athe tall blond man with one black shoe@ will be scrambled into Athe blond black man with one tall shoe.@

The same mechanisms that clarify prepositional phrases probably help us to understand full sentences with independent clauses (AI think I saw him leave to go home@). The closed-class words, because they are limited in number, can probably all be handled as special cases, each with their own completion requirements regarding role links.

Verbs have more multifaceted completion requirements. Each verb has a characteristic set of links: some required, some optional, some prohibited. The conglomeration feels like a proper sentence only if all the obligatory links are satisfied and no words are left dangling, unsupported by a structural role. AGive@ requires three nouns with appropriate role tags, Asleep@ cannot tolerate more than a sleeper except via a prepositional phrase.

What keeps the top-level hexagon happy enough to reproduce effectively in a copying competition with other variant interpretations, such as that Ablond black man@? Presumably, a few alternatives assemble in parallel until one gains the strong Alegs@ needed to allow it to become robust enough to establish hegemony. If, in AI think I saw him leave to go home,@ the Aleave@ link stumbles, the Asaw@ hexagons might not compete very effectively, and so the top level dangles.

So the Ameaning of the sentence@ is, in this model, an abstract cerebral code whose hexagons compete for territory with those suggesting alternative interpretations. Phrases and clauses require coherent corticocortical links to contributing territories, having their own competitions and tendencies to die out if not reinforced by backprojecting codes. Weblike crosstalk between subchoruses presumably occurs, and may be quite useful so long as it remains weak enough not to show up on the audit trail.

It starts to look like a choral work of many voices, each singing a different tune but with the requirement that it mesh well with all the others. Indeed, the symphonic metaphor might be appropriate for the more complex sentences that we can generate and understand. Certainly the reverse-order analogy to Benjamin Britten=s Young Person=s Guide to the Orchestra, the all-together version being succeeded by the various voices playing separately, is the best metaphor I know for the read-out process that converts the parallel-structured plan into serial-ordered speech.

Though the common code for many cortical areas is obviously a Good Trick, is it the Good Trick that transformed RA-augmented protolanguage into fluent syntax?

One quick qualifying test is to consider the implications of efficiently linking the concept filled temporal lobe with the prepare-for-action frontal lobe, with a common code replacing the degenerate codes B and then dropping back to the old system, with incoherent paths forcing a reliance on slowly established associative links. Does it degrade gracefully, as communications engineers try to achieve with digital packet-based systems?

Without coherence, you=d still have a vocabulary (the temporal lobe still works). You=d still be able to plan some nonlanguage actions (you=d pass many of the neuropsychological tests for frontal lobe functioning), but your ability to quickly invent new trial run associations would suffer. Not only couldn=t you form up a syntactic sentence to speak (except for stock phrases), but you couldn=t judge sentences that you heard someone else speak because you could no longer judge the quality of your trial interpretations, whether they were nonsense, good guesses, or sure things. Your quality associations would be too slow for the windows of opportunity, and the results would be of poor quality because not shaped up very far by Darwinian copying competitions in the brain. And so your performance on language tasks would drop back to something like protolanguage, a wide choice of words but with novel sentences limited to just a few words to avoid ambiguity.

Another type of pathology I can imagine (chapter 11 of Cerebral Code has many more) concerns the hegemony requirements of the top-level hexagonal competition (call it the alpha mosaic). Suppose that there have to be N_action hexagons singing in the plainchant chorus before an action sequence is triggered (or, in the comprehension version of the language task, you decide the problem is solved and you can move on). Suppose that, in order to communicate with subchorus beta, it takes only N_link singers in alpha. Now suppose there is noise in the backpath, raising the critical number needed in alpha in order to keep beta singing. What happens if N_link becomes larger than N_action? Well, you couldn=t incorporate distant clauses and phrases B you could neither read them out nor have their maintenance affect the top-level competition in alpha. Only the incoherent version of the alpha code would arrive in beta and, unless it was the incoherent code for a common phrase that beta would recognize, it would be ineffective.

So degraded coherence might well cause most aspects of syntax to degrade even more abruptly. If you pull the incoherence card, it will read:

Do not pass syntax,
go directly to
protolanguage

Coherence, as I briefly mentioned earlier, can also help concepts to establish colonies or branch offices in distant areas of cortex. To avoid the delays inherent in using concept resonances in distant places, you could import some of the more frequently used resonances into the frontal lobe. Nouns that started out as temporal lobe specialties could secondarily operate out of the frontal lobe, following such a metastasis.

In a two-level Hebbian memory system, the long-lasting resonances are usually produced by a sufficient number of repetitions of the spatiotemporal firing pattern. If beta=s mosaic is frequently seeding a chorus in alpha, then a resonance for it might also develop in alpha (perhaps not within the projection area of the coherent bundle, but somewhere in the territory of the mosaic secondarily generated thereabouts). In computers, this would be known as a cache (keeping frequently used code as close as possible to the processor). But it need not be a temporary resonance (as the cache analogy suggests), as it could be consolidated there (which is why I used the colony and branch office as analogies).

Furthermore, it need not be just the code for concepts; it could be code for performance, the little subroutine that it takes to quickly perform certain algorithms (the so-called Acortical reflexes@ so handy for quickly hitting the brakes if someone shouts AStop!@). The only real difference between the arbitrary spatiotemporal code for a sensation and that for a movement is whether it meshes well with the output pathways from the cerebral cortex and actually moves muscles in a coordinated way.

Within much of the cortex, code is simply anonymous code. If a spatiotemporal pattern can be copied, it may serve as a code; if it can be coherently sent over long distances, it may become a code common to many neocortical areas; if it fits the requirements of output pathways, it may even find its way out of the brain into the real world.

Other higher intellectual functions (music, planning for tomorrow, logic, playing games with rules) may, more generally, benefit from the neural systems that are so essential for syntax. Any task that requires the progressive improvement of quality would benefit from syntax=s Darwin Machine. The discovery of order amidst seeming disorder might become much easier. The segmented planner suggests a way of creating new levels of abstraction for relationships, a way to compare relationships and generate metaphors B while still being able to decompose the whole into specific actions, such as speaking a sentence. (Sometimes, we can=t get the output aspect together Bwe Aknow things of which we cannot speak.@)

Thought often has to span many possible levels of explanation and locate an appropriate one. As we try to speak usefully about a subject like language or the brain, we are often torn between dwelling on rock-solid details and speaking in terms of perhaps too abstract generalities. We need them all, but we can only speak about one at a time.

We duck when we see someone cock an arm to throw a stone at us because we are predicting: we recognize the beginning sequence of a small spatial story, imagine the rest, and respond. Narrative imagining is our fundamental form of predicting.

When we decide that it is perfectly reasonable to place our plum on the dictionary but not the dictionary on our plum, we are both predicting and evaluating. Evaluating the future of an act is evaluating the wisdom of the act. In this way, narrative imagining is also our fundamental form of evaluating.

When we hear something and want to see it, and walk to a new location in order to see it, we have made and executed a Plan. We have constructed a story taking us from the original situation to the desired situation and executed the story. The story is the plan. In this way, narrative imagining is our fundamental cognitive instrument for planning.

When a drop of water falls mysteriously from the ceiling and lands at our feet, we try to imagine a story that begins from the normal situation and ends with the mysterious situation. The story is the explanation. Narrative imagining is our fundamental cognitive instrument for explanation.

BMark Turner, The Literary Mind, 1996

DB: There=s an interesting issue here about the relationship between immediate memory limitations and sentence structure. You mentioned the Aneed for placeholders@ in conjunction with chunking. Remember that placeholders were also needed for quite another purpose: if you didn=t have every obligatory thematic role of a verb represented by something, there were too many ambiguities for the hearer to process long and/or complex sentences. But immediate memory limitations may have played their part, too, in the following way.
It may be sheer coincidence, but the number of possible thematic roles is in the vicinity of seven. You have the three that are most often obligatory, agent, theme, goal, then the optional ones, time, place, beneficiary, instrument. (Some linguists will suggest more, such as source B AI bought it from Bill@ B and maybe one or two extra, but nobody thinks it=s much more than seven). Certainly you hardly ever, if ever, find a clause that contains more than seven thematic roles, by the broadest criteria B certainly not in spontaneous speech. Then looking down the hierarchy to the phrase level, you often find a phrase with more than seven words B you wouldn=t need to be Holmes or Watson to find dozens in this book, I=d imagine B but I doubt you=d find one with more than seven units, sub-phrases, or clauses. And certainly none of these sub-units would be longer than seven words. In other words (no pun intended), you take words and chunk them into phrases, then each of these phrases into a bigger phrase or a clause, then the clauses into a sentence, and so on up the line. You will be handling words far in excess of immediate memory limitations, but at no level of the operation will the chunks you=re working with add up to more than seven at a time, and most times you=ll have a comfortable margin of two to four units.

None of this by itself accounts for the stunning automaticity of speech B never has so much speech been uttered by so many with so little thought. However, structured chunking does remove from speech what would otherwise be a serious brake upon it. It=s a nice case of language killing the bird of ambiguity and the bird of memory limitation with one stone, so to speak.

WHC: Let me try to summarize some likely steps B perhaps Aramps@ might be the better word B along the way to the linguists= Universal Grammar:

» Symbols, those abstract stand-ins for real things and categories like Anothing,@ are the first step. It=s now clear that a number of species can master such concepts with skillful teaching, even if only a few (like vervets) use them in the wild. And they rarely invent new ones.

» Small collections of symbols with a compound meaning, corresponding to the short sentences of protolanguage. Clearly a number of species are capable, with tutoring or rearing in a language-rich environment, to comprehend (and sometimes even produce) such brief sentences.

» Longer collections of symbols that would be hopelessly ambiguous without structuring clues. A standard word order buys you some clues, the little words of grammar buy you some more. Intensive language rearing might get this through to various species, substituting for the acquisitiveness and inventiveness that human children seem to come with, without giving them everything that most children possess by the age of three. (However, such an intermediate level of syntax has not been identified in either children or stroke patients.)

» The full-fledged grammar beloved of linguists, what self-organizes in childhood from listening to language or watching a fluent sign language. Functionally, this too might be attained by intensively rearing nonhuman species during critical periods of early life, but humans might remain unique in the effortless acquisition of all those things that add up to nested embedding, empty categories, movement, and so forth.

And for completeness, let me add:

» Literacy, the written version of language, requires extensive tutoring. Some individuals have brains that cannot master reading despite fully-fledged spoken language.

This makes advanced language acquisition something like predispositions for a disease: you can acquire lung cancer if you work hard enough, but the genes sure do change the chances of Asuccess.@ You can have the predisposition without the disease, and vice versa.

There=s no doubt but that human genes combined with human culture give the infant a big leg up on acquiring phonemes, then words, then protolanguage, then structuring long sentences and narratives. Some of the various up-ramps leading to advanced language abilities could be steeper than others, more dependent (as is our reading ability) on tutoring. But apes might have the circuitry without having such epigenetic aspects as acquisitiveness; we=ll never know how much of advanced language they can master until we try hard to rear infant apes with lots of enrichment. Maybe syntax is for them the way reading is for us B or maybe their brains really do lack some hominid-only hardwired circuitry. Until we know better, it might be best to view Universal Grammar genes as affecting the predisposition to softwire in certain patterns via experience or invention, not via some innate hardwiring present at birth. And to remember that syntax, however acquired, makes possible all sorts of more abstract meanings:

We typically conceive of concepts as packets of meaning. We give them labels: marriage, birth, death, force, electricity, time, tomorrow. Meanings seem localized and stable. But parable gives us a different view of meaning as arising from connections across more than one mental space. Meaning is not a deposit in a concept container. It is alive and active, dynamic and distributed, constructed for local purposes of knowing and acting. Meanings are not mental objects bounded in conceptual places but rather complex operations of projection, binding, linking, blending, and integration over multiple spaces.

BMark Turner, The Literary Mind, 1996

On to the NEXT CHAPTER

Notes and References for this chapter

The handheld, traditionally comfortable version of this book is available from
amazon.com or direct from
MIT Press.

Email Calvin
Email Bickerton

Book's Table of Contents
Calvin Home Page