- Butterfield, Jeremy.
Damp Squid.
Oxford: Oxford University Press, 2008.
ISBN 978-0-19-923906-1.
- 
Dictionaries attempt to capture how language (or at least the
words of which it is composed) is used, or in some cases
should be used according to the compiler of the
dictionary, and in rare examples, such as the
monumental
Oxford English Dictionary (OED),
to trace the origin and history of the use of words over
time.  But dictionaries are no better than the source material
upon which they are based, and even the OED, with its millions
of quotations contributed by thousands of volunteer readers, can only
sample a small fraction of the written language.  Further,
there is much more to language than the definitions of
words: syntax, grammar, regional dialects and usage,
changes due to the style of writing (formal, informal,
scholarly, etc.), associations of words with one another,
differences between spoken and written language, and
evolution of all of these matters and more over time.
Before the advent of computers and, more recently, the
access to large volumes of machine-readable text afforded
by the Internet, research into these aspects of linguistics
was difficult, extraordinarily tedious, and its accuracy
suspect due to the small sample sizes necessarily used in
studies.
Computer linguistics sets out to study how a language is actually
used by collecting a large quantity of text (called
a corpus), tagged with identifying information useful
for the intended studies, and permitting measurement of the statistics
of the content of the text.  The first computerised corpus was
created in 1961, containing the then-staggering number of one million
words.  (Note that since a corpus contains extracts of text, the
word count refers to the total number of words, not the number of
unique words—as we'll see shortly, a small number of words
accounts for a large fraction of the text.)  The preeminent
research corpus today is the
Oxford
English Corpus which, in 2006, surpassed two billion words
and is presently growing at the rate of 350 million words a
year—ain't the Web grand, or what?
This book, which is a pure delight, compelling page turner,
and must-have for all fanatic “wordies”, is a light-hearted
look at the state of the English language today: not what it
should be, but what it is.  Traditionalists
and fussy prescriptivists (among whom I count myself) will be
dismayed at the battles already lost: “miniscule”
and “straight-laced” already outnumber “minuscule”
and “strait-laced”, and many other barbarisms and
clueless coinages are coming on strong.  Less depressing and more
fascinating are the empirical research on word frequency
(Zipf's Law
is much in evidence here, although it is never cited by name)—the ten
most frequent words make up 25% of the corpus, and the top one
hundred account for fully half of the text—word origins,
mutation of words and terms, association of words with one
another, idiomatic phrases, and the way context dictates the
choice of words which most English speakers would find almost
impossible to distinguish by definition alone.  This amateur astronomer
finds it heartening to discover that the most common noun modified
by the adjective “naked” is “eye” (1398
times in the corpus; “body” is second at 1144 occurrences).
If you've ever been baffled by the origin of the idiom “It's
raining cats and dogs” in English, just imagine how puzzled
the Welsh must be by “Bwrw hen
wragedd a ffyn” (“It's raining old women
and sticks”).
The title?  It's an example of an “eggcorn”
(p. 58–59): a common word or phrase which mutates
into a similar sounding one as speakers who can't puzzle out its original,
now obscure, meaning try to make sense of it.  Now that the
safetyland culture has made most people unfamiliar with explosives,
“damp squib” becomes “damp squid” (although,
if you're a squid, it's not being damp that's a
problem).  Other eggcorns marching their way through the language
are “baited breath”, “preying mantis”,
and “slight of hand”.
January 2009 