etset - translate ISO 8859 Etext to ASCII, LaTeX, or HTML
etset [ -a -f -h -l -u -w ] [ infile [ outfile ] ]
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
appears both before and after the actual body of the Etext. This allows including an arbitrary prefix and postfix to the body of the document.
Chapter number -------------------- Chapter name
The line of equal signs must be centered and contain three or more equal signs and no other characters other than white space. Chapter "numbers" need not be numeric--they can be any text. Documents without chapter breaks should contain an initial chapter mark following the title with Chapter number of "*" and a blank Chapter name.
\( \lambda \acute{o} \gamma o \varsigma \)
and the formula for the roots of a quadratic equation as:
\( x_{1,2} = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} \)
(Note: I acknowledge that this provision is controversial. It is as distasteful to me as I suspect it is to you. In its defence, let me treat the Greek letter and math formula cases separately. Using LaTeX encoding for Greek letters is purely a stopgap until Unicode comes into common use on enough computers so that we can use it for Etexts which contain characters not in the ASCII or ISO 8859/1 sets (which are the 7- and 8-bit subsets of Unicode, respectively). If an author uses a Greek word in the text, we have two ways to proceed in attempting to meet the condition:
The etext, when displayed, is clearly readable, and does not contain characters other than those intended by the author of the work, although....
The first approach is to transliterate into Roman characters according to a standard table such as that given in The Chicago Manual of Style. This preserves readability and doesn't require funny encoding, but in a sense violates the author's "original intent"--the author could have transliterated the word in the first place but chose not to. By transliterating we're reversing the author's decision. The second approach, encoding in LaTeX or some other markup language, preserves the distinction that the author wrote the word in Greek and maintains readability since letters are called out by their English language names, for the most part. Of course LaTeX helps us only for Greek (and a few characters from other languages). If you're faced with Cyrillic, Arabic, Chinese, Japanese, or other languages written in non-Roman letters, the only option (pre-Unicode) is to transliterate.
I argue that encoding mathematical formulas as LaTeX achieves the goal of "readable by humans" on the strength of LaTeX encoding being widely used in the physics and mathematics communities when writing formulas in E-mail and other ASCII media. Just as one is free to to transliterate Greek in an Etext, one can use ASCII artwork formulas like:
--------- + / 2 -b - \/ b - 4ac x = ------------------ 1,2 2a
This is probably a better choice for occasional formulas simple enough to write out this way. But to produce Etexts of historic scientific publications such as Einstein's "Zur Elektrodynamik bewegter Körper" (the special relativity paper published in Annalen der Physik in 1905), trying to render the hundreds of complicated equations in ASCII is not only extremely tedious but in all likelihood counterproductive; ambiguities in trying to render complex equations would make it difficult for a reader to determine precisely what Einstein wrote unless conventions just as complicated (and harder to learn) as those of LaTeX were adopted for ASCII expression of mathematics. Finally, the choice of LaTeX encoding is made not only based on its existing widespread use but because the underlying software that defines it (TeX and LaTeX) are entirely in the public domain, available in source code form, implemented on most commonly-available computers, and frozen by their authors so that, unlike many commercial products, the syntax is unlikely to change in the future and obsolete current texts).
. , : ; ? ! ` ' ( ) { } " + = - / * @ # $ % & ~ ^ | < >
In other words, the characters:
_ [ ] \
are never used except in the special senses defined above.
Errors in Greek words and mathematical formulas encoded as LaTeX are not detected by etset and will result in LaTeX errors when the -l option output is processed.
When generating HTML files, ISO graphic characters which are not required to be encoded in the &char; form by the HTML spec are output in their original 8-bit form. Expanding them to their &char; equivalents would result in the output being a pure 7-bit file, but would blow up the output file size substantially and render it far more difficult to edit by hand. I am aware of no contemporary Web server, brower, or authoring tool which cannot correctly process files which include ISO graphic characters.
The structure of the program resembles an inelastic collision of three separate programs for ASCII, LaTeX, and HTML output, for the excellent reason that etset is precisely that. While this results in substantial duplication of code, it does mean that changes in the code for a given format are less likely to break one of the other output types.
latex(1)
This software is in the public domain. Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, without any conditions or restrictions. This software is provided "as is" without express or implied warranty.