NDOCThe Null Document Processor |
Ever since the 1960s, I have had an aversion to text formatting tools which clutter the input text with cumbersome mark-up for items (for example, the start of each new paragraph), for which there have been visual conventions in hand- and typewritten documents for centuries. (Here, I am speaking of text formatters which take as input a machine-readable file and prepare it for output on various devices, not “what you see is what you get” word processors, which did not exist at the time.)
After a particularly bad experience with a locally-developed program at a place I worked which made you type a special code for every capital letter, I adapted the Univac DOC program for my purposes and began to use it for all of my documents. It understood the structure of documents, especially those written in the style of computer manuals, and required very little mark-up for most documents.
The problem with mark-up is that, for the author or editor of a document, it gets in the way of the flow of thought when writing and, especially, when reading or editing documents. The more the original document looks like what will appear when it is formatted, the easier it is to write, read, find errors, and correct them.
By the mid-1980s, I had turned the page on Univac iron and its software, and was ringing down the curtain on my Marinchip Systems hardware and software, the latter including the Marinchip 9900 Word Processor [PDF], which I designed based upon concepts drawn from Univac DOC and the Unix nroff utility, again allowing many documents to be formatted with minimal mark-up.
At the time, programmers at Autodesk, including myself, were producing large numbers of internal documents for code submissions to AutoCAD, design drafts, and user guides for in-house software tools. I had just started to experiment with TeX/LaTeX for formatting beautiful camera-ready documents for publication, but that was clearly overkill for in-house documents which, if they were printed at all, would probably be on the dot matrix printers with monospaced characters most widely used in the era. Besides, in addition to its steep learning curve, TeX was difficult and expensive to install on the modest MS-DOS machines we were using and required a very expensive full graphics laser printer as its output device.
To meet this need, I cobbled together NDOC, a simple C language text formatter which went to the extreme of, in its original incarnation, using no mark-up at all. It simply used the formatting of the text on the page as a guide to the author's intentions, and in most cases would correctly format the output text accordingly. Later, it accreted some very simple mark-up to handle things like centring text on the page, right justifying, and including other files in a document. In the spirit of the 1980s and good fun, I called this the “EMBEDDED EXPERT SYSTEM”. The ability to extract and format documentation embedded within the source code of C and PL/I programs can be thought of as a crude baby step toward literate programming.
Here, for your amusement, is NDOC. This is the program as it stood as of the last revision on July 24th, 1989, three decades ago. I have made only minimal changes to get the program to compile without warnings on contemporary C compilers and to remove some artefacts of the 16-bit C compiler on which it was built at the time. The code remains in ancestral “K&R C”—it would be inauthentic to convert it to the ANSI dialect. The fact that input and output are specified only by redirection and the home-baked handling of command line options are legacies of the program's having been developed on MS-DOS.
Should you use NDOC? Probably not: there are many present-day alternatives ranging from the Unix fmt utility, Markdown, Org-mode, and numerous visual text editors with built-in formatting facilities. Further, the main feature of NDOC: the ability to fill and justify text for monospace font output, is rarely required in an age when monospace fonts are the exception, not the rule. Still, I'll confess that I reach for NDOC from time to time when I need right justified text to embed within source code or a text document. And now, if you wish, so can you.
Some of the features of NDOC interacted with other Autodesk-developed software such as a locally-developed DIFF utility, the AutoBOOK electronic publication prototype, and my silly TC English text compressor. The Autodesk DIFF and AutoBOOK support code remain in the program, but are useless unless you manage to dig up source code for those long-forgotten programs. Support for TC-compressed text was silently removed at some point in the 1980s because it conflicted with documents which used accented and special characters from the ISO 8859-1 character set. Descriptions of these features in the following user guide appear in grey text.
Here is the user guide, unchanged since March 1986 (except for the coding needed to embed it in an HTML document). It was, of course, formatted by NDOC and is presented in a monospace font as expected by NDOC. The source code for the document is included in the source distribution (see download link at the bottom of this page) as the file ndoc.ndoc. Some of the wisecracks in the document were a result of having recently participated in writing the prospectus for Autodesk's 1985 initial public stock offering.
The Non-Document On the Null Document Processor by Kelvin R. Throop Revision 13 -- March 19, 1986 NDOC is an experiment in software engineering. For the first time in my speckled career, I have deliberately undertaken a project in full knowledge and with conscious intent that it will grow out of control, mushroom into something much larger than expected, and end up by reinventing a tool that not only have others created time after time, but one which I have created myself time after time. In addition, the NDOC project was undertaken as a celebration of pragmatic software development. In other words, uncontrolled and unmanaged growth. It will evolve as the earth did, by accretion of unconnected components, related only by their accidental proximity, and cemented only by their random interconnections. In short, a typical document processor. I am tired of document processors which treat my document as a food processor treats food. Macros are for diet freaks. If I wanted to write a program, I'd write a program, not a document. Enough! So, NDOC, the chondritic concretion of document processing thumps onto the scene. What does it do today? As little as possible. If you call it with: NDOC -? it will print the following: NDOC -- The null document processor. Call with NDOC [options] <input >output. Options: -B Pagination for AutoBOOK -Cf Change bars with DIFF file f -D Double space -H 1st line is running heading -In Indent n columns -Jn Justify to column n -Lx Set up for LaserJet P = Proportional space -Nn Number pages [n = "of"] C = Count pages only -Pn Page length n -S Number lines lawyer-like P = Programmer-like -Tn Top margin n -W Print on wide (EDP) paper -X Inhibit automatic formatting -Z Defaults of a prudent man Its input is from standard input, the output to standard output. Thus, for those of exiguous mentation like myself, you say: NDOC >prn <myfile.doc Got it? The switches are all mostly somewhat independent part of the time. You can't combine them after a dash. To print with double spacing, indented 10 characters, with line numbers, use: NDOC >prn <booga.doc -d -i10 -s The -H switch prints the first line of the document on successive pages as a running heading. The -I switch indents the output on the paper. The default is 5, or 20 if the -W switch is set, which is equivalent to -I20. Why -W? Because I can't remember the number! The -J switch turns on automatic justification. The -J may be followed by a number specifying the desired column width to justify to. If no number is specified, the default is 70. If you write as I do, with block paragraphs and without indentation, the -J switch will work perfectly; you never have to include any garbage in the text to control it! The rules are as follows: a line is a candidate for justification only if it has a nonblank in the first column. If the line is less than 80% full (based on the justification column) it is not justified, nor is it justified if it contains multiple embedded blanks excepting the case of two spaces following a ".", "!", or "?". If a line is already longer than the justification column, it is left alone. If you prepare your documents with line wrap set on your editor to less than the -Jn length, and you reformat paragraphs if you muck them up with the editor, everything will work beautifully. If it doesn't, you're obviously in a state of sin; such behaviour cannot be justified. If justification is enabled and the EMBEDDED EXPERT SYSTEM is activated (see below), the justification column may be changed within a document by inserting a line containing the sentinel ">!" in the first two columns and the character "!" in the rightmost column of the justification area. Text will be justified to the column specified by the second exclamation point. No other characters may appear on the command line other than the sentinel, blanks, and the exclamation point indicating the justification column. The -L switch sends the LaserJet reset sequence before printing in case some bozo left it in backwards Cyrillic italics. -LP sets proportional spacing mode if such a font is installed. Don't use -L if the output device isn't a LaserJet or you'll regret it. Don't use proportional spacing mode along with the -J switch unless you like garbage. If -N is set, pages will be numbered centred at the bottom. If a number follows the -N, e.g., -N4, the numbers will be printed as "x of n", where "n" is the number ("2 of 4"). How do you know how to set n? Historically, users were forced to run it first and look at the number on the last page, then run it again and specify that after the -N. Remember to use the same switches on both runs! Management believes substantially all the switches have significant effects on measurable elements of page consumption. Send the output to your CRT and save a tree. To hell with the electrons; if you've seen one, you've seen 'em all (and WHY do they all weigh the same amount, anyway?). But now, in the bright light of technology, the Company's proprietary Incompetent System technology has enabled computers to count, so you can just say -NC, and all output will be suppressed. At the end of the document, the number of pages generated will be printed, so you can use that number the next time with the -N switch. You still have to be careful to set all the same switches on the page counting run as the production run. The -S switch numbers all nonblank lines starting with 1 at the top of each page. Go write a prospectus if you question the utility of this feature. No, running headings aren't numbered on successive pages. Quality is free. And sometimes accidental. For conventional line numbering (all lines numbered with absolute line numbers in the file), use the -SP switch to specify programmer-style numbers. This allows NDOC to be used as a text file lister. While this may seem to be silly, considering the plethora of other tools for this function, it does allow one to take advantage of NDOC's unique features, such as change bars, when listing programs. The -C switch will automatically generate change bars relative to an earlier version of the document being printed. To print an updated version of a document with change bars, DIFF the new document against the old one with the most recent edition given to DIFF as the "old file" and the base document as the "new file", and save the output in a file. For example, if you wish to print RATBAG.DOC with change bars with respect to the base named RATBAG.OLD, you would use: ADIFF RATBAG.DIF=RATBAG.DOC,RATBAG.OLD NDOC <RATBAG.DOC -CRATBAG.DIF <whatever> When text is simply deleted (e.g., a -n line in the DIFF output), a minus sign will be output on the next line of the original text, unless that line would otherwise have a change bar. The one-up Company feels that this somewhat unconventional feature reduces the possibility of lines dropping in the hole while slugging. If users object, we'll reluctantly add an option to this program which, to date, has been so pristinely option-free. The -B switch makes NDOC generate +PAGE separators as expected by AutoBOOK at page breaks. It also defaults indentation to zero and running headings off for compatibility with AutoBOOK. If you have funny paper, you can set the lines to skip at the top and the body length with the -T and -P switches respectively. The default settings are equivalent to -T3 -P56. NDOC assumes it can feed a page with a form feed. If not, tough. If you have hilarious paper and set -T larger than -P, risible results reliably recur. To cause NDOC to generate a "scroll" of output without page breaks, use -P0. This setting renders the -H and -N options and the setting of -T nugatory. The -Z switch selects the defaults which would be chosen by a prudent man in disposing of his own document. If selected by itself it runs NDOC in page counting mode (-NC) mode. After running once with -Z, then run with -Z -Nx where x is the page count from the first run. NDOC will automatically expand text files compressed with the English text file compressor, TC. This expansion requires no option, as NDOC automatically figures out that the file is compressed and expands it on the fly. So, feel free to compress your document files and save 30% of your disc space (you can always expand them back with TC -D when you need to edit them). In addition, NDOC automatically expands tabs inserted on eight column boundaries. Thus you can use NDOC on files including tabs, and may tell your text editor to automatically tab output files (thanks to Kern Sibbald for suggesting this). You can even have tabbed, compressed files, although that's a bit silly. You may force page breaks anywhere in the document by inserting form feed characters. Form feeds may either be placed on lines by themselves, in which case they act as page eject commands, or may be prefixed to text lines. If a form feed appears as the first character of a text line, that line will be printed at the top of the next page. Thanks to Eric Lyons for recommending this feature. You may include text from another file in the printed output by inserting a line which begins with the characters "<<". This text will be processed as if it had been physically copied into the input file in place of the "<<" line. The "<<" must appear in the first two columns of the line to include the file. Includes may be nested to any depth, constrained only by the operating system's limit on the number of concurrently open files (FILES parameter in CONFIG.SYS for MS-DOS). NDOC has previously remained almost entirely pure of sensitivity to the input text most of the time with only sporadic and well-justified but nonetheless often annoying albeit rational exceptions. But no more. In the spirit of the Eighties, where one need only say the secret words "expert system" to have the cosMECC duck come down and hand you a billion dollars, NDOC has acquired an EMBEDDED EXPERT SYSTEM, which examines the text and with a rule-based methodology, diddles it as it sees fit. The EMBEDDED EXPERT SYSTEM is enabled by default, but may be turned off by setting the -X switch on the call line. The EMBEDDED EXPERT SYSTEM does not currently know about change bars, so setting the -C switch and specifying a DIFF file will also turn it off. The EMBEDDED EXPERT SYSTEM can be directed by placing sentinels around the text. These sentinels may appear at the start or the end of the line; if at the start, they must appear in the first two columns of the line. To cause NDOC to centre a line of text, just place the text between the "forcing brackets" as follows: >>This line will be centred.<< Of course, the actual line must begin in the first column. To force a line to be right-aligned, place only the first forcing bracket: >>Slam this line to the right. And to force a line to be left-justified (causing the automatic justifier to not process it), use only the second forcing bracket: This line is left justified.<< The EMBEDDED EXPERT SYSTEM will automatically identify paragraphs in the text, including those with hanging indentation and bullets, and move words from line to line to fill to the current justification width. The process of assembling words into a paragraph continues until a line is encountered which contains one of the forcing brackets, a line which fails to qualify for justification by containing extra embedded spaces (so that tables are automatically not justified), or a line which fails to begin in the same column as the second line of the paragraph. You can break up lines which would normally be combined by inserting a line with just a forcing bracket, such as: >> >><< >> Forced lines which are otherwise blank disappear and have only the effect of blocking the movement of text. The EMBEDDED EXPERT SYSTEM also permits documents to be embedded within C and PL/I programs. The first occurrence of the sentinel: /*DOC in column 1 of an input line causes NDOC to enter embedded document mode. Thereafter, only lines between that sentinel and the matching sentinel: DOC*/ which will also only be recognised if placed in column 1 will be processed by NDOC. The sentinels will be ignored. Thus, you may write a program and embed the documentation between these comments. When you compile the program, the compiler will ignore the documentation. When you process the document with NDOC, the program will be ignored. The resulting file is a self-documenting program, far more likely to reflect the current state of the code than a program with comments and a separate document. If you want to put the document at some location not at the beginning of the program file, just put the two lines: /*DOC DOC*/ as the first two lines of the program. This will place NDOC in embedded document mode without printing anything. The /*DOC which precedes the first segment of documentation will turn interpretation of text back on when encountered. Many more great features are on the way!
by John Walker June, 2019 |
|