Development Details
Source document
I started with a flat ASCII text version of the Immigration and
Nationality Act obtained from the
U.S. House of Representatives
Office of the Law Revision Counsel, current up to January 19th, 2004, corresponding
to Supplement III of the 2000 edition of the U.S. Code.
Compilation into HTML for the Web
I wrote an incredibly messy and inefficient C program to
compile the text into
XHTML 1.0,
creating the master table of contents
and section index. Code Sections which contain
statute text are parsed to extract the section hierarchy, and
cross-references between sections are linked, wherever
possible, so that clicking on a cross-reference displays the
cited text.
Compilation is a three-phase process. The first pass reads the ASCII
document, extracts the section hierarchy, assigns HTML file names, and
stores the text in memory by component. The second pass reads the
statute text to identify the hierarchy of statute text within sections,
assigns names to anchors (link targets), and builds a table of anchors.
The final pass creates the actual HTML file for each section, scanning
the text for cross-references and inserting links to the anchors
assigned in the second pass. The program logic in the second and
third passes is supplemented by a "hints" file identifying more
than 200 occurrences of improper formatting in the Code, allowing
the program to correctly parse text it would otherwise misinterpret.
Building the full-text search database
A modified version of the compiler program was used to create a second
copy of the Code which was indexed with
freeWAIS-sf,
with the title of each section containing the name of the compiled
HTML document. A modified version of
Ulrich Pfeifer's
SFgate
interface between the Web and WAIS uses
the Common
Gateway Interface (CGI) to submit queries to the
freeWAIS-sf search server and constructs the reply document from the
items returned. Text retrieval is not done through WAIS; WAIS serves
solely as a search tool which points to Web documents.
by John Walker