Next Up Previous Contents Index
Next: SCAN Commands Up: Auto Book Notes Previous: Help facility.

SCAN -- The preprocessor

Auto Book consists of a document processor which reads the formatted text generated by a word processing program. The preprocessor scans the document, and based either on information encoded in the document, or by user-selected heuristic rules, identifies logical sections of the document and assigns them names. It prepares a rapid-access file of the document text, and creates a file containing encoded references to words in the document with pointers to the sections of text in which each word appears. The preprocessor may also perform compression of the text, and encode it against access by programs other than the Auto Book retrieval program. Neither of these functions are currently implemented.

The preprocessor contains an algorithm called the ``rooter'' which extracts the root of words with prefixes and suffixes. This algorithm must be carefully defined, and will vary for each natural language supported. References are stored by the root of the words, so that asking for references to ``test'' will find references to ``test'', ``tested'', ``tests'', ``tester'', ``retest'', etc. This is not currently implemented.

Once a document has been ``compiled'' by the preprocessor, it may be read with the ``READ'' utility.

The preprocessor is invoked by calling the SCAN program. It presents a menu allowing only the options of preprocessing a document or exiting. Before calling SCAN, you should have put the WORD formatted output of the document into a file with a .TXT type. You should also create a file with the same name and a .RAT type with size about one sector per line of text in the .TXT file. You should also create a .REF file. There's no easy way to estimate the .REF file size, so make a huge file initially. SCAN will tell you how much it used after it's done. You should have a TEMP1$ file on Drive 1 (MDEX) before calling SCAN.

Once you tell SCAN you want to process a document, all you have to do is enter the ``root name'' (less the .TXT) of the document, and SCAN will do the rest.

SCAN knows about various commands embedded in the document. These commands will be removed from the files created by SCAN. All commands are flagged with a plus sign (+) in column 1. Note that these commands are entered as text with WORD, and that care must be taken to insure that WORD will not format them into the middle of another line. See the file ``USC.WRD'' for an example of how the SCAN commands can be inserted in a document.



Next Up Previous Contents Index
Next: SCAN Commands Up: Auto Book Notes Previous: Help facility.

Editor: John Walker