Topic: Language

Language

Butterfield, Jeremy. Damp Squid. Oxford: Oxford University Press, 2008. ISBN 978-0-19-923906-1.

Dictionaries attempt to capture how language (or at least the words of which it is composed) is used, or in some cases should be used according to the compiler of the dictionary, and in rare examples, such as the monumental Oxford English Dictionary (OED), to trace the origin and history of the use of words over time. But dictionaries are no better than the source material upon which they are based, and even the OED, with its millions of quotations contributed by thousands of volunteer readers, can only sample a small fraction of the written language. Further, there is much more to language than the definitions of words: syntax, grammar, regional dialects and usage, changes due to the style of writing (formal, informal, scholarly, etc.), associations of words with one another, differences between spoken and written language, and evolution of all of these matters and more over time. Before the advent of computers and, more recently, the access to large volumes of machine-readable text afforded by the Internet, research into these aspects of linguistics was difficult, extraordinarily tedious, and its accuracy suspect due to the small sample sizes necessarily used in studies.

Computer linguistics sets out to study how a language is actually used by collecting a large quantity of text (called a corpus), tagged with identifying information useful for the intended studies, and permitting measurement of the statistics of the content of the text. The first computerised corpus was created in 1961, containing the then-staggering number of one million words. (Note that since a corpus contains extracts of text, the word count refers to the total number of words, not the number of unique words—as we'll see shortly, a small number of words accounts for a large fraction of the text.) The preeminent research corpus today is the Oxford English Corpus which, in 2006, surpassed two billion words and is presently growing at the rate of 350 million words a year—ain't the Web grand, or what?

This book, which is a pure delight, compelling page turner, and must-have for all fanatic “wordies”, is a light-hearted look at the state of the English language today: not what it should be, but what it is. Traditionalists and fussy prescriptivists (among whom I count myself) will be dismayed at the battles already lost: “miniscule” and “straight-laced” already outnumber “minuscule” and “strait-laced”, and many other barbarisms and clueless coinages are coming on strong. Less depressing and more fascinating are the empirical research on word frequency (Zipf's Law is much in evidence here, although it is never cited by name)—the ten most frequent words make up 25% of the corpus, and the top one hundred account for fully half of the text—word origins, mutation of words and terms, association of words with one another, idiomatic phrases, and the way context dictates the choice of words which most English speakers would find almost impossible to distinguish by definition alone. This amateur astronomer finds it heartening to discover that the most common noun modified by the adjective “naked” is “eye” (1398 times in the corpus; “body” is second at 1144 occurrences). If you've ever been baffled by the origin of the idiom “It's raining cats and dogs” in English, just imagine how puzzled the Welsh must be by “Bwrw hen wragedd a ffyn” (“It's raining old women and sticks”).

The title? It's an example of an “eggcorn” (p. 58–59): a common word or phrase which mutates into a similar sounding one as speakers who can't puzzle out its original, now obscure, meaning try to make sense of it. Now that the safetyland culture has made most people unfamiliar with explosives, “damp squib” becomes “damp squid” (although, if you're a squid, it's not being damp that's a problem). Other eggcorns marching their way through the language are “baited breath”, “preying mantis”, and “slight of hand”.

January 2009

Houston, Keith. Shady Characters. New York: W. W. Norton, 2013. ISBN 978-0-393-06442-1.

¶ The earliest written languages seem mostly to have been mnemonic tools for recording and reciting spoken text. As such, they had little need for punctuation and many managed to get along withoutevenspacesbetweenwords. If you read it out loud, it's pretty easy to sound out (although words written without spaces can be used to create deliciously ambiguous text). As the written language evolved to encompass scholarly and sacred texts, commentaries upon other texts, fiction, drama, and law, the structural complexity of the text grew apace, and it became increasingly difficult to express this in words alone. Punctuation was born.

In the third century B.C. Aristophanes of Byzantium (not to be confused with the other fellow), librarian at Alexandria, invented a system of dots to denote logical breaks in Greek texts of classical rhetoric, which were placed after units called the komma, kolon, and periodos. In a different graphical form, they are with us still.

Until the introduction of movable type printing in Europe in the 15th century, books were hand-copied by scribes, each of whom was free, within the constraints of their institutions, to innovate in the presentation of the texts they copied. In the interest of conserving rare and expensive writing materials such as papyrus and parchment, abbreviations came into common use. The humble ampersand (the derivation of whose English name is delightfully presented here) dates to the shorthand invented by Cicero's personal secretary/slave Tiro, who invented a mark to quickly write “et” as his master spoke.

Other punctuation marks co-evolved with textual criticism: quotation marks allowed writers to distinguish text from other sources included within their works, and asterisks, daggers, and other symbols were introduced to denote commentary upon text. Once bound books (codices) printed with wide margins became common, readers would annotate them as they read, often ☛ pointing out key passages. Even a symbol as with-it as the now-ubiquitous “@” (which I recall around 1997 being called “the Internet logo”) is documented as having been used in 1536 as an abbreviation for amphorae of wine. And the ever-more-trending symbol prefixing #hashtags? Isaac Newton used it in the 17th century, and the story of how it came to be called an “octothorpe” is worthy of modern myth.

This is much more than a history of obscure punctuation. It traces how we communicate in writing over the millennia, and how technologies such as movable type printing, mechanical type composition, typewriting, phototypesetting, and computer text composition have both enriched and impoverished our written language. Impoverished? Indeed—I compose this on a computer able to display in excess of 64,000 characters from the written languages used by most people since the dawn of civilisation. And yet, thanks to the poisonous legacy of the typewriter, only a few people seem to be aware of the distinction, known to everybody setting type in the 19th century, among the em-dash—used to set off a phrase; the en-dash, denoting “to” in constructions like “1914–1918”; the hyphen, separating compound words such as “anarcho-libertarian” or words split at the end of a line; the minus sign, as in −4.221; and the figure dash, with the same width as numbers in a font where all numbers have the same width, which permits setting tables of numbers separated by dashes in even columns. People who appreciate typography and use TeX are acutely aware of this and grind their teeth when reading documents produced by demotic software tools such as Microsoft Word or reading postings on the Web which, although they could be so much better, would have made Mencken storm the Linotype floor of the Sunpapers had any of his writing been so poorly set.

Pilcrows, octothorpes, interrobangs, manicules, and the centuries-long quest for a typographical mark for irony (Like, we really need that¡)—this is a pure typographical delight: enjoy!

In the Kindle edition end of chapter notes are bidirectionally linked (albeit with inconsistent and duplicate reference marks), but end notes are not linked to their references in the text—you must manually flip to the notes and find the number. The end notes contain many references to Web URLs, but these are not active links, just text: to follow them you must copy and paste them into a browser address bar. The index is just a list of terms, not linked to references in the text. There is no way to distinguish examples of typographic symbols which are set in red type from chapter note reference links set in an identical red font.

October 2013

Wolfe, Tom. The Kingdom of Speech. New York: Little, Brown, 2016. ISBN 978-0-316-40462-4.

In this short (192) page book, Tom Wolfe returns to his roots in the “new journalism”, of which he was a pioneer in the 1960s. Here the topic is the theory of evolution; the challenge posed to it by human speech (because no obvious precursor to speech occurs in other animals); attempts, from Darwin to Noam Chomsky to explain this apparent discrepancy and preserve the status of evolution as a “theory of everything”; and the evidence collected by linguist and anthropologist Daniel Everett among the Pirahã people of the Amazon basin in Brazil, which appears to falsify Chomsky's lifetime of work on the origin of human language and the universality of its structure. A second theme is contrasting theorists and intellectuals such as Darwin and Chomsky with “flycatchers” such as Alfred Russel Wallace, Darwin's rival for priority in publishing the theory of evolution, and Daniel Everett, who work in the field—often in remote, unpleasant, and dangerous conditions—to collect the data upon which the grand thinkers erect their castles of hypothesis.

Doubtless fearful of the reaction if he suggested the theory of evolution applied to the origin of humans, in his 1859 book On the Origin of Species, Darwin only tiptoed close to the question two pages from the end, writing, “In the distant future, I see open fields for far more important researches. Psychology will be securely based on a new foundation, that of the necessary acquirement of each mental power and capacity of gradation. Light will be thrown on the origin of man and his history.” He needn't have been so cautious: he fooled nobody. The very first review, five days before publication, asked, “If a monkey has become a man—…?”, and the tempest was soon at full force.

Darwin's critics, among them Max Müller, German-born professor of languages at Oxford, and Darwin's rival Alfred Wallace, seized upon human characteristics which had no obvious precursors in the animals from which man was supposed to have descended: a hairless body, the capacity for abstract thought, and, Müller's emphasis, speech. As Müller said, “Language is our Rubicon, and no brute will dare cross it.” How could Darwin's theory, which claimed to describe evolution from existing characteristics in ancestor species, explain completely novel properties which animals lacked?

Darwin responded with his 1871 The Descent of Man, and Selection in Relation to Sex, which explicitly argued that there were precursors to these supposedly novel human characteristics among animals, and that, for example, human speech was foreshadowed by the mating songs of birds. Sexual selection was suggested as the mechanism by which humans lost their hair, and the roots of a number of human emotions and even religious devotion could be found in the behaviour of dogs. Many found these arguments, presented without any concrete evidence, unpersuasive. The question of the origin of language had become so controversial and toxic that a year later, the Philological Society of London announced it would no longer accept papers on the subject.

With the rediscovery of Gregor Mendel's work on genetics and subsequent research in the field, a mechanism which could explain Darwin's evolution was in hand, and the theory became widely accepted, with the few discrepancies set aside (as had the Philological Society) as things we weren't yet ready to figure out.

In the years after World War II, the social sciences became afflicted by a case of “physics envy”. The contribution to the war effort by their colleagues in the hard sciences in areas such as radar, atomic energy, and aeronautics had been handsomely rewarded by prestige and funding, while the more squishy sciences remained in a prewar languor along with the departments of Latin, Medieval History, and Drama. Clearly, what was needed was for these fields to adopt a theoretical approach grounded in mathematics which had served so well for chemists, physicists, engineers, and appeared to be working for the new breed of economists.

It was into this environment that in the late 1950s a young linguist named Noam Chomsky burst onto the scene. Over its century and a half of history, much of the work of linguistics had been cataloguing and studying the thousands of languages spoken by people around the world, much as entomologists and botanists (or, in the pejorative term of Darwin's age, flycatchers) travelled to distant lands to discover the diversity of nature and try to make sense of how it was all interrelated. In his 1957 book, Syntactic Structures, Chomsky, then just twenty-eight years old and working in the building at MIT where radar had been developed during the war, said all of this tedious and messy field work was unnecessary. Humans had evolved (note, “evolved”) a “language organ”, an actual physical structure within the brain—the “language acquisition device”—which children used to learn and speak the language they heard from their parents. All human languages shared a “universal grammar”, on top of which all the details of specific languages so carefully catalogued in the field were just fluff, like the specific shape and colour of butterflies' wings. Chomsky invented the “Martian linguist” which was to come to feature in his lectures, who he claimed, arriving on Earth, would quickly discover the unity underlying all human languages. No longer need the linguist leave his air conditioned office. As Wolfe writes in chapter 4, “Now, all the new, Higher Things in a linguist's life were to be found indoors, at a desk…looking at learned journals filled with cramped type instead of at a bunch of hambone faces in a cloud of gnats.”

Given the alternatives, most linguists opted for the office, and for the prestige that a theory-based approach to their field conferred, and by the 1960s, Chomsky's views had taken over linguistics, with only a few dissenters, at whom Chomsky hurled thunderbolts from his perch on academic Olympus. He transmuted into a general-purpose intellectual, pronouncing on politics, economics, philosophy, history, and whatever occupied his fancy, all with the confidence and certainty he brought to linguistics. Those who dissented he denounced as “frauds”, “liars”, or “charlatans”, including B. F. Skinner, Alan Dershowitz, Jacques Lacan, Elie Wiesel, Christopher Hitchens, and Jacques Derrida. (Well, maybe I agree when it comes to Derrida and Lacan.) In 2002, with two colleagues, he published a new theory claiming that recursion—embedding one thought within another—was a universal property of human language and component of the universal grammar hard-wired into the brain.

Since 1977, Daniel Everett had been living with and studying the Pirahã in Brazil, originally as a missionary and later as an academic linguist trained and working in the Chomsky tradition. He was the first person to successfully learn the Pirahã language, and documented it in publications. In 2005 he published a paper in which he concluded that the language, one of the simplest ever described, contained no recursion whatsoever. It also contained neither a past nor future tense, description of relations beyond parents and siblings, gender, numbers, and many additional aspects of other languages. But the absence of recursion falsified Chomsky's theory, which pronounced it a fundamental part of all human languages. Here was a field worker, a flycatcher, braving not only gnats but anacondas, caimans, and just about every tropical disease in the catalogue, knocking the foundation from beneath the great man's fairy castle of theory. Naturally, Chomsky and his acolytes responded with their customary vituperation, (this time, the adjective of choice for Everett was “charlatan”). Just as they were preparing the academic paper which would drive a stake through this nonsense, Everett published Don't Sleep, There Are Snakes, a combined account of his thirty years with the Pirahã and an analysis of their language. The book became a popular hit and won numerous awards. In 2012, Everett followed up with Language: The Cultural Tool, which rejects Chomsky's view of language as an innate and universal human property in favour of the view that it is one among a multitude of artifacts created by human societies as a tool, and necessarily reflects the characteristics of those societies. Chomsky now refuses to discuss Everett's work.

In the conclusion, Wolfe comes down on the side of Everett, and argues that the solution to the mystery of how speech evolved is that it didn't evolve at all. Speech is simply a tool which humans used their big brains to invent to help them accomplish their goals, just as they invented bows and arrows, canoes, and microprocessors. It doesn't make any more sense to ask how evolution produced speech than it does to suggest it produced any of those other artifacts not made by animals. He further suggests that the invention of speech proceeded from initial use of sounds as mnemonics for objects and concepts, then progressed to more complex grammatical structure, but I found little evidence in his argument to back the supposition, nor is this a necessary part of viewing speech as an invented artifact. Chomsky's grand theory, like most theories made up without grounding in empirical evidence, is failing both by being falsified on its fundamentals by the work of Everett and others, and also by the failure, despite half a century of progress in neurophysiology, to identify the “language organ” upon which it is based.

It's somewhat amusing to see soft science academics rush to Chomsky's defence, when he's arguing that language is biologically determined as opposed to being, as Everett contends, a social construct whose details depend upon the cultural context which created it. A hunter-gatherer society such as the Pirahã living in an environment where food is abundant and little changes over time scales from days to generations, doesn't need a language as complicated as those living in an agricultural society with division of labour, and it shouldn't be a surprise to find their language is more rudimentary. Chomsky assumed that all human languages were universal (able to express any concept), in the sense David Deutsch defined universality in The Beginning of Infinity, but why should every people have a universal language when some cultures get along just fine without universal number systems or alphabets? Doesn't it make a lot more sense to conclude that people settle on a language, like any other tools, which gets the job done? Wolfe then argues that the capacity of speech is the defining characteristic of human beings, and enables all of the other human capabilities and accomplishments which animals lack. I'd consider this not proved. Why isn't the definitive human characteristic the ability to make tools, and language simply one among a multitude of tools humans have invented?

This book strikes me as one or two interesting blog posts struggling to escape from a snarknado of Wolfe's 1960s style verbal fireworks, including Bango!, riiippp, OOOF!, and “a regular crotch crusher!”. At age 85, he's still got it, but I wonder whether he, or his editor, questioned whether this style of journalism is as effective when discussing evolutionary biology and linguistics as in mocking sixties radicals, hippies, or pretentious artists and architects. There is some odd typography, as well. Grave accents are used in words like “learnèd”, presumably to indicate it's to be pronounced as two syllables, but then occasionally we get an acute accent instead—what's that supposed to mean? Chapter endnotes are given as superscript letters while source citations are superscript numbers, neither of which are easy to select on a touch-screen Kindle edition. There is no index.

January 2017