Fourmilab home

NDOC

The Null Document Processor

Hammer

Ever since the 1960s, I have had an aversion to text formatting tools which clutter the input text with cumbersome mark-up for items (for example, the start of each new paragraph), for which there have been visual conventions in hand- and typewritten documents for centuries. (Here, I am speaking of text formatters which take as input a machine-readable file and prepare it for output on various devices, not “what you see is what you get” word processors, which did not exist at the time.)

After a particularly bad experience with a locally-developed program at a place I worked which made you type a special code for every capital letter, I adapted the Univac DOC program for my purposes and began to use it for all of my documents. It understood the structure of documents, especially those written in the style of computer manuals, and required very little mark-up for most documents.

The problem with mark-up is that, for the author or editor of a document, it gets in the way of the flow of thought when writing and, especially, when reading or editing documents. The more the original document looks like what will appear when it is formatted, the easier it is to write, read, find errors, and correct them.

By the mid-1980s, I had turned the page on Univac iron and its software, and was ringing down the curtain on my Marinchip Systems hardware and software, the latter including the Marinchip 9900 Word Processor [PDF], which I designed based upon concepts drawn from Univac DOC and the Unix nroff utility, again allowing many documents to be formatted with minimal mark-up.

At the time, programmers at Autodesk, including myself, were producing large numbers of internal documents for code submissions to AutoCAD, design drafts, and user guides for in-house software tools. I had just started to experiment with TeX/LaTeX for formatting beautiful camera-ready documents for publication, but that was clearly overkill for in-house documents which, if they were printed at all, would probably be on the dot matrix printers with monospaced characters most widely used in the era. Besides, in addition to its steep learning curve, TeX was difficult and expensive to install on the modest MS-DOS machines we were using and required a very expensive full graphics laser printer as its output device.

To meet this need, I cobbled together NDOC, a simple C language text formatter which went to the extreme of, in its original incarnation, using no mark-up at all. It simply used the formatting of the text on the page as a guide to the author's intentions, and in most cases would correctly format the output text accordingly. Later, it accreted some very simple mark-up to handle things like centring text on the page, right justifying, and including other files in a document. In the spirit of the 1980s and good fun, I called this the “EMBEDDED EXPERT SYSTEM”. The ability to extract and format documentation embedded within the source code of C and PL/I programs can be thought of as a crude baby step toward literate programming.

Here, for your amusement, is NDOC. This is the program as it stood as of the last revision on July 24th, 1989, three decades ago. I have made only minimal changes to get the program to compile without warnings on contemporary C compilers and to remove some artefacts of the 16-bit C compiler on which it was built at the time. The code remains in ancestral “K&R C”—it would be inauthentic to convert it to the ANSI dialect. The fact that input and output are specified only by redirection and the home-baked handling of command line options are legacies of the program's having been developed on MS-DOS.

Should you use NDOC? Probably not: there are many present-day alternatives ranging from the Unix fmt utility, Markdown, Org-mode, and numerous visual text editors with built-in formatting facilities. Further, the main feature of NDOC: the ability to fill and justify text for monospace font output, is rarely required in an age when monospace fonts are the exception, not the rule. Still, I'll confess that I reach for NDOC from time to time when I need right justified text to embed within source code or a text document. And now, if you wish, so can you.

Some of the features of NDOC interacted with other Autodesk-developed software such as a locally-developed DIFF utility, the AutoBOOK electronic publication prototype, and my silly TC English text compressor. The Autodesk DIFF and AutoBOOK support code remain in the program, but are useless unless you manage to dig up source code for those long-forgotten programs. Support for TC-compressed text was silently removed at some point in the 1980s because it conflicted with documents which used accented and special characters from the ISO 8859-1 character set. Descriptions of these features in the following user guide appear in grey text.

User Guide

Here is the user guide, unchanged since March 1986 (except for the coding needed to embed it in an HTML document). It was, of course, formatted by NDOC and is presented in a monospace font as expected by NDOC. The source code for the document is included in the source distribution (see download link at the bottom of this page) as the file ndoc.ndoc. Some of the wisecracks in the document were a result of having recently participated in writing the prospectus for Autodesk's 1985 initial public stock offering.

           The Non-Document On the Null Document Processor

                         by Kelvin R. Throop
                    Revision 13 --  March 19, 1986

NDOC  is an experiment in software engineering.  For the first time in
my speckled career, I have deliberately undertaken a project  in  full
knowledge  and with conscious intent that it will grow out of control,
mushroom into something much larger  than  expected,  and  end  up  by
reinventing  a tool that not only have others created time after time,
but one which I have created myself time after time.

In addition, the NDOC project  was  undertaken  as  a  celebration  of
pragmatic  software  development.   In  other  words, uncontrolled and
unmanaged growth.  It will evolve as the earth did,  by  accretion  of
unconnected  components,  related  only by their accidental proximity,
and cemented only by their random interconnections.

In short, a typical document processor.

I am tired of document processors which treat my document  as  a  food
processor  treats  food.   Macros are for diet freaks.  If I wanted to
write a program, I'd write a program, not a document.

Enough!

So, NDOC, the chondritic concretion of document processing thumps onto
the  scene.   What  does  it do today?  As little as possible.  If you
call it with:

        NDOC -?

it will print the following:

   NDOC --  The null document processor.  Call
            with NDOC [options] <input >output.

            Options:   -B   Pagination for AutoBOOK
                       -Cf  Change bars with DIFF file f
                       -D   Double space
                       -H   1st line is running heading
                       -In  Indent n columns
                       -Jn  Justify to column n
                       -Lx  Set up for LaserJet
                             P = Proportional space
                       -Nn  Number pages [n = "of"]
                             C = Count pages only
                       -Pn  Page length n
                       -S   Number lines lawyer-like
                             P = Programmer-like
                       -Tn  Top margin n
                       -W   Print on wide (EDP) paper
                       -X   Inhibit automatic formatting
                       -Z   Defaults of a prudent man

Its input is from standard  input,  the  output  to  standard  output.
Thus,   for   those  of  exiguous  mentation  like  myself,  you  say:

    NDOC >prn <myfile.doc

Got it?

The switches are all mostly somewhat independent  part  of  the  time.
You  can't  combine  them after a dash.  To print with double spacing,
indented 10 characters, with line numbers, use:

    NDOC >prn <booga.doc -d -i10 -s

The -H switch prints the first line  of  the  document  on  successive
pages  as  a running heading.  The -I switch indents the output on the
paper.  The default is 5, or 20 if the -W  switch  is  set,  which  is
equivalent  to  -I20.   Why  -W?  Because I can't remember the number!

The -J switch  turns  on  automatic  justification.   The  -J  may  be
followed  by  a  number specifying the desired column width to justify
to.  If no number is specified, the default is 70.  If you write as  I
do,  with block paragraphs and without indentation, the -J switch will
work perfectly; you never have to include any garbage in the  text  to
control  it!   The  rules  are  as  follows: a line is a candidate for
justification only if it has a nonblank in the first column.   If  the
line  is  less than 80% full (based on the justification column) it is
not justified, nor is it justified if it  contains  multiple  embedded
blanks  excepting the case of two spaces following a ".", "!", or "?".
If a line is already longer than the justification column, it is  left
alone.   If  you  prepare  your  documents  with line wrap set on your
editor to less than the -Jn length, and you reformat paragraphs if you
muck them up with the editor, everything will work beautifully.  If it
doesn't, you're obviously in a state of sin; such behaviour cannot  be
justified.

If  justification  is  enabled  and  the  EMBEDDED  EXPERT  SYSTEM  is
activated (see below), the justification column may be changed  within
a  document  by  inserting  a line containing the sentinel ">!" in the
first two columns and the character "!" in the rightmost column of the
justification area.  Text will be justified to the column specified by
the second exclamation point.  No other characters may appear  on  the
command  line  other  than  the  sentinel, blanks, and the exclamation
point indicating the justification column.

The -L switch sends the LaserJet reset  sequence  before  printing  in
case  some  bozo  left  it  in  backwards  Cyrillic italics.  -LP sets
proportional spacing mode if such a font is installed.  Don't  use  -L
if  the output device isn't a LaserJet or you'll regret it.  Don't use
proportional spacing mode along with the -J  switch  unless  you  like
garbage.

If  -N  is  set,  pages  will be numbered centred at the bottom.  If a
number follows the -N, e.g., -N4, the numbers will be printed as "x of
n", where "n" is the number ("2 of 4").  How do you know how to set n?
Historically, users were forced to run it first and look at the number
on  the  last  page,  then run it again and specify that after the -N.
Remember to use the same switches on both runs!   Management  believes
substantially  all the switches have significant effects on measurable
elements of page consumption.  Send the output to your CRT and save  a
tree.  To hell with the electrons; if you've seen one, you've seen 'em
all (and WHY do they all weigh the same amount, anyway?).  But now, in
the  bright light of technology, the Company's proprietary Incompetent
System technology has enabled computers to count, so you can just  say
-NC,  and  all output will be suppressed.  At the end of the document,
the number of pages generated will be printed, so  you  can  use  that
number the next time with the -N switch.  You still have to be careful
to set all  the  same  switches  on  the  page  counting  run  as  the
production run.

The -S switch numbers all nonblank lines starting with 1 at the top of
each page.  Go write a prospectus if you question the utility of  this
feature.   No,  running  headings aren't numbered on successive pages.
Quality is free.  And sometimes accidental.

For conventional line numbering (all lines numbered with absolute line
numbers  in  the file), use the -SP switch to specify programmer-style
numbers.  This allows NDOC to be used as a text  file  lister.   While
this may seem to be silly, considering the plethora of other tools for
this function, it does allow one to take advantage  of  NDOC's  unique
features, such as change bars, when listing programs.

The  -C  switch will automatically generate change bars relative to an
earlier version of the document being printed.  To  print  an  updated
version  of a document with change bars, DIFF the new document against
the old one with the most recent edition given to  DIFF  as  the  "old
file"  and the base document as the "new file", and save the output in
a file.  For example, if you wish to print RATBAG.DOC with change bars
with   respect   to   the   base  named  RATBAG.OLD,  you  would  use:

    ADIFF RATBAG.DIF=RATBAG.DOC,RATBAG.OLD
    NDOC <RATBAG.DOC -CRATBAG.DIF <whatever>

When text is simply deleted (e.g., a -n line in the  DIFF  output),  a
minus  sign  will  be  output  on  the next line of the original text,
unless that line would  otherwise  have  a  change  bar.   The  one-up
Company  feels  that  this somewhat unconventional feature reduces the
possibility of lines dropping in the hole while  slugging.   If  users
object,  we'll  reluctantly  add  an  option to this program which, to
date, has been so pristinely option-free.

The -B switch makes NDOC generate  +PAGE  separators  as  expected  by
AutoBOOK  at  page  breaks.   It also defaults indentation to zero and
running headings off for compatibility with AutoBOOK.

If you have funny paper, you can set the lines to skip at the top  and
the body length with the -T and -P switches respectively.  The default
settings are equivalent to -T3 -P56.  NDOC assumes it can feed a  page
with a form feed.  If not, tough.  If you have hilarious paper and set
-T larger than -P, risible results reliably recur.  To cause  NDOC  to
generate  a  "scroll"  of  output  without page breaks, use -P0.  This
setting renders the -H and -N options and the setting of -T  nugatory.

The  -Z switch selects the defaults which would be chosen by a prudent
man in disposing of his own document.  If selected by itself  it  runs
NDOC  in  page  counting mode (-NC) mode.  After running once with -Z,
then run with -Z -Nx where x is the page count  from  the  first  run.

NDOC  will automatically expand text files compressed with the English
text file compressor, TC.  This expansion requires no option, as  NDOC
automatically  figures  out that the file is compressed and expands it
on the fly.  So, feel free to compress your document  files  and  save
30%  of  your  disc  space (you can always expand them back with TC -D
when you need to edit them).  In addition, NDOC automatically  expands
tabs  inserted  on  eight column boundaries.  Thus you can use NDOC on
files including tabs, and may tell your text editor  to  automatically
tab  output  files  (thanks to Kern Sibbald for suggesting this).  You
can even have tabbed, compressed files, although that's a  bit  silly.

You  may  force page breaks anywhere in the document by inserting form
feed characters.   Form  feeds  may  either  be  placed  on  lines  by
themselves,  in  which case they act as page eject commands, or may be
prefixed to text lines.  If a form feed appears as the first character
of a text line, that line will be printed at the top of the next page.
Thanks to Eric Lyons for recommending this feature.

You may include text from  another  file  in  the  printed  output  by
inserting  a  line  which  begins with the characters "<<".  This text
will be processed as if it had been physically copied into  the  input
file in place of the "<<" line.  The "<<" must appear in the first two
columns of the line to include the file.  Includes may  be  nested  to
any  depth,  constrained  only  by the operating system's limit on the
number of concurrently open files (FILES parameter in  CONFIG.SYS  for
MS-DOS).

NDOC  has  previously  remained almost entirely pure of sensitivity to
the input text most of the time with only sporadic and  well-justified
but  nonetheless  often  annoying  albeit rational exceptions.  But no
more.  In the spirit of the Eighties, where  one  need  only  say  the
secret  words  "expert  system" to have the cosMECC duck come down and
hand you a billion dollars,  NDOC  has  acquired  an  EMBEDDED  EXPERT
SYSTEM,  which  examines  the  text and with a rule-based methodology,
diddles it as it sees fit.  The EMBEDDED EXPERT SYSTEM is  enabled  by
default,  but  may  be turned off by setting the -X switch on the call
line.  The EMBEDDED EXPERT SYSTEM does not currently know about change
bars,  so  setting  the -C switch and specifying a DIFF file will also
turn it off.

The EMBEDDED EXPERT SYSTEM can be directed by placing sentinels around
the  text.   These sentinels may appear at the start or the end of the
line; if at the start, they must appear in the first  two  columns  of
the line.  To cause NDOC to centre a line of text, just place the text
between the "forcing brackets" as follows:

    >>This line will be centred.<<

Of course, the actual line must begin in the first column.  To force a
line  to  be  right-aligned,  place  only  the  first forcing bracket:

    >>Slam this line to the right.

And to force a  line  to  be  left-justified  (causing  the  automatic
justifier  to  not  process  it), use only the second forcing bracket:

    This line is left justified.<<

The EMBEDDED EXPERT SYSTEM will automatically identify  paragraphs  in
the  text,  including  those with hanging indentation and bullets, and
move words from line to line to  fill  to  the  current  justification
width.   The  process  of  assembling words into a paragraph continues
until a  line  is  encountered  which  contains  one  of  the  forcing
brackets,   a  line  which  fails  to  qualify  for  justification  by
containing extra embedded spaces (so that tables are automatically not
justified),  or  a line which fails to begin in the same column as the
second line of the paragraph.  You can  break  up  lines  which  would
normally  be combined by inserting a line with just a forcing bracket,
such as:

    >>
    >><<
    >>

Forced lines which are otherwise blank disappear  and  have  only  the
effect of blocking the movement of text.

The  EMBEDDED  EXPERT  SYSTEM  also  permits  documents to be embedded
within C and PL/I programs.  The first  occurrence  of  the  sentinel:

    /*DOC

in  column  1  of an input line causes NDOC to enter embedded document
mode.  Thereafter, only lines between that sentinel and  the  matching
sentinel:

    DOC*/

which  will  also  only  be  recognised  if placed in column 1 will be
processed by NDOC.  The sentinels will  be  ignored.   Thus,  you  may
write  a  program  and embed the documentation between these comments.
When  you  compile  the  program,  the  compiler   will   ignore   the
documentation.   When  you process the document with NDOC, the program
will be ignored.  The resulting file is  a  self-documenting  program,
far  more  likely  to  reflect  the  current  state of the code than a
program with comments and a separate document.  If you want to put the
document  at  some  location not at the beginning of the program file,
just put the two lines:

    /*DOC
    DOC*/

as the first two lines of  the  program.   This  will  place  NDOC  in
embedded  document  mode  without  printing anything.  The /*DOC which
precedes the first segment of documentation will  turn  interpretation
of text back on when encountered.

Many more great features are on the way!