Hack Links
Proposal by John Walker
July 12th, 1995
Introduction
One essential part of Ted Nelson's original concept of
Xanadu
was
that links between documents be bi-directional--when a
link was made to a document, the linked-to text would become a link
back to the document that referenced it. This was believed to be
an essential component of an open hypertext system intended for
discussion of complex issues, and a great improvement over current forms of
scholarly publication.
With the advent of the World-Wide Web, links have come into the mainstream of
writing and publishing, but these links are unidirectional--there's no
way to know if a link has been made to your document, and no way to
attach comments of your own to documents you read on the Web. In the
Web, as it exists today, we have forward links but no back links.
Hack Links is a crude mechanism that provides a limited back link
capability for Web documents. Despite its many shortcomings, it may
prove useful to demonstrate the utility of back links and obtain
practical experience in their use which can guide the evolution of a
more practical and comprehensive facility for eventual inclusion in
the HTML standard with Web client and server support.
Design Constraints
Many of the limitations in Hack Links arise from the need to satisfy
the following design constraints. The reason I decided to impose each
is given.
- No browser changes
- Hack Links requires no support at the browser level, and
will work with even the simplest text-oriented browsers.
Requiring a special browser would drastically reduce the
audience who could access back links and would exclude
users of machines on which the modified browser was not
implemented. Browser support would help enormously,
especially in making the creation of back links essentially
painless; perhaps the existence of Hack Links will motivate
one or more browser developers to prototype a friendly
interface to it.
- No server changes
- Any site which wishes to experiment with back links can
do so without replacing or modifying their existing
HTTP dæmon. Again, few sites would take the risk or
go to the bother of changing their server just to experiment
with back links.
- No HTML extensions
- This falls out of the browser and server constraints.
- Local installation not required to make back links in remote documents
- Users can make back links in documents on machines
with Hack Links installed without having Hack Links
installed on their own local machine. There are many
nice things we could do if this constraint were relaxed,
but it would block access to users of commercial
Internet Service Providers who might be reluctant to
install Hack Links on their public servers.
- Back links enabled on a document-by-document basis
- Many Web site operators would be rightly concerned if
not horrified at the prospect of anybody on the Web being able
to place links in documents they publish. Xanadu envisioned
extensive link filtering mechanisms not present in Hack Links.
If my site contains, for example, stories for children, I'd be
worried about some malicious soul making links in them to
material to which I don't believe children should be exposed.
To encourage experimentation with back links, I've required
back links to be explicitly enabled by the creator of a
document, based on the audience to which it is addressed and
whatever disclaimers it may contain regarding the fact that
any Web user can create links in the document.
- No modifications to documents required
- Back links can be enabled in any HTML document without
requiring it to be modified. Again, I want to reduce the
barrier potential to encourage experimentation with back
links.
- Assume a Unix server machine
- Since the overwhelming majority of Web server machines are
Unix based, we'll indulge ourselves in assuming that
environment, and the ability to compile and run vanilla ANSI C
programs. We'll need to do some things that are highly
system-specific, such as TCP/IP communications and file
locking, and no attempt will be made to implement non-Unix
versions of this code. On the other hand, the code will not
deliberately use Unix-specific features where portable
alternatives exist.
- Keep it simple
- Most issues of elegance and efficiency have been ruthlessly
jettisoned in favour of simplifying the implementation.
The simpler the program, the easier it is to convince a
site manager (or yourself, if you run your own site) that it's
safe to install. Also, since it's intended only as a
starting point for development of a genuine bidirectional
link facility, there's no reason to build in a lot of
speculative functionality until we get a feel for what's
needed.
How it Works
Hack Links is implemented as a set of C programs which are installed
on a server which wishes to provide back links to its users and
executed via the Common Gateway Interface (CGI) mechanism. These
programs maintain a database of extant back links in a file external
to a document and permit retrieval of a documents containing back
links, addition of new back links, and following links when a given
piece of text contains multiple overlapping links. The individual
programs are described below.
- hlxget local_file [ -t target ]
- The HTML file named local_file, present in the directory
from which httpd serves documents, is opened, along
with the file containing its back links,
local_file.bl. If no .bl file
exists, back links are not available for this document, and it
is returned without modifications. If a back link file is
present it is read, along with the text file, and HTML anchors
are inserted in the text for each back link found in the
.bl file. If a segment of text is the object of only
one href anchor, a direct link to the target is
inserted. For text which contains links to multiple
destinations, whether exclusively back links or a conventional
link in the document and one or more overlapping back links,
an executable link to the hlxchoose program is
generated, with arguments that identify the links from the
given text. The user's browser thus receives an HTML document
in which all back links currently extant have been
interpolated as conventional links.
If the -t option is specified, rather than returning
the interpolated HTML directly, hlxget writes it into
a temporary cache directory on the server, adding an anchor
target for the word number range given by the *TARGET
statement in the document's .bl file with target name
target. A URL is returned which references this
temporary file, using a "#" specification so the document is
positioned to the given passage. Words within the range of
the target are shown in bold face. A job is scheduled to
delete the temporary file after a decent interval. (This is
an inelegant approach, but there's no other way I know that
allows us to position the user's browser to a given passage in
the text while meeting all the design constraints. Keeping
the targets in the .bl file allows their word
ranges to be updated when the document is revised; otherwise
remote documents would have local line numbers embedded in
them with no way to know if they become invalid.)
- hlxchoose local_file link_id1 link_id2 ... [
-a "url" ]
- A CGI reference to this program is created by hlxget whenever a
sequence of text in a document it is transmitting is found to
be the object of two or more links, whether all back links or
an explicit link in the document and one or more back links.
The local_file.bl is read, and a HTML
document is returned to the user which explains that the link
just followed goes to multiple destinations, listing them by
document title. The applicable links are identified by their
unique link_ids in the .bl file.
Clicking on any of the titles sends the
user to that document. A link allows returning to the
original document, but it's generally better to use the
Back button of most browsers since that preserves
position in the file. The "-a" switch is used to
supply the URL referenced by a forward link in the document,
and will be clearly distinguished from back links made by
others.
- hlxmake local_file target_url
args
- If a local_file.bl file exists, a back link
is created in it to the specified target_url. The
link is created in the local document at the location given by
args, which can be in any of the following formats.
In all cases hlxmake will create a back link only if the
given target_url is found to be accessible.
hlxmake uses a lock file to prevent two concurrent
back link requests from turning the .bl file into green slime.
Lock files are kept in /tmp so they're automatically
cleaned up if the system reboots.
- -t "text passage"
[ -tb "text before" ]
[ -ta "text after" ]
- The local_file is searched for the given
text passage, ignoring all HTML
mark-up and punctuation. The link is placed at the
unique occurrence of that passage. If the passage
occurs more than once in the file, the user can
indicate the precise location by specifying text
before or after the passage to which the link is
to be made.
This form permits creation of back links without any
support in the browser. A back-linkable document can
contain a button which pops up a form (or,
alternatively, it can provide the form itself as part
of the document). The server can provide a standard
back link form which allows placing back links in
eligible documents with no co-operation by the
target document at all. The user cuts and pastes the
passage to which the link is to be made, along with
neighbouring text, if necessary, to specify a unique
location, and the URL the back link goes to.
- -w start end
- The link is made to the passage of text composed of
words numbered start through end in
the document, ignoring all HTML mark-up and and
punctuation.
This request syntax is included to encourage
intelligent browsers to support back links in a
more convenient fashion. The user can simply
highlight the passage to which the link is to be made,
then when the user requests a back link there, the
browser calculates the word numbers of the highlighted
passage and passes them directly to hlxmake.
- -a target_name
- A *TARGET with the given name is added to
the .bl file spanning the word range of the
previous back link specification. The
target_name must not already exist in the
document's .bl file.
- -b document_url args
- If the -b switch is present, no
target_url is specified. Instead, a
second set of arguments in one of the forms described
above follows, and is used to create a back link in
the named document which points back to the
first link. This allows creating bi-directional links
between documents on remote servers, as long as both
documents are enabled for back links. When this form
is used, URLs are created in each document
which request the other with hlxget, using
the -t option to position the browser to the
text at the other end of the link and display it in a
highlighted form. Targets are added to the
.bl files of both documents as needed.
Local targets can be added directly, while targets in
remote documents can be added by invoking
hlxmake with the "-a" option.
This request form is intended to support intelligent
browsers which permit the user to highlight passages
of text in two concurrently-open documents and create
a bidirectional link between them.
What if the Document Changes?
One of the advantages of Xanadu's central back-end was its ability to
automatically guarantee the validity of link locations when documents
were revised. Web documents are created by a variety of external
tools, and the externally stored back links used by Hack Links have no
means of being automatically updated if a document changes. This
places the burden on the creator of a document who has chosen to
permit back links to deal with existing back links if the document is
subsequently revised. Failure to do so will result in back links
moving to locations in the document unintended by their creators,
which is highly undesirable to all involved. Several
different approaches to this problem are discussed below.
- Do nothing: document is static.
- Before investing a large amount of effort to resolve the
revision vs. links problem, it's worth noting that many
documents are essentially static. In paper publishing, anything
published is static by definition, unless
superseded by a subsequent edition with a different
designation. When hypertext is used as a medium for debate
and discussion, it may be the case that most contributions are
as static as news postings or E-mail; they aren't revised at
all, but rather re-written when appropriate and redistributed.
In this case, the author would make whatever links he
considered appropriate in the new edition, then add a
link on the title of the old edition pointing to the revised
document.
- Automatic link update.
- Since back links are stored as word number spans within a
document, when revisions are made to a document, one could
create a "word diff" program which attemptsd to identify the
changes between the original and revised documents and then
adjust the word numbers of the back links in the original
document to the corresponding positions in the revised text.
Like line-oriented diff, finding changes is a
heuristic process which can become confused, particularly when
a document is reorganised, moving large sections around.
Still, despite its shortcomings, many source
code control systems have been built upon diff, and a word diff
may prove adequate for many cases of document
revisions.
- Back link embedder/extractor.
- When extensive editing of documents is contemplated, to a
degree that the automatic link updating described above
would get hopelessly lost (for example, if you're assembling
a summary document from extracts of contributions by a variety
of people), we could develop a program which merged a document
and its back link file, inserting <BACKLINK HREF=...>
and </BACKLINK> tags for each back link and target. These tags,
not being valid HTML tags, would be ignored by browsers,
allowing the author to preview the document as it was edited.
When the final edition was complete, a second program would
extract the BACKLINK tags into a separate .bl file
and create an corresponding HTML file with the BACKLINK tags
removed.
This approach has the disadvantage that new back links added
to the document during the editing process will be lost.
The author might, at the time of extraction, add a
*BLOCK statement to the .bl file (see
below), with a *MESSAGE notifying
people who
attempt to make back links that the document is currently
being revised and back links should be made in the new
document when it is published.
Back link file format
Back links are kept in an ASCII file in the following format.
The file consists of various control records, each identified by an
asterisk in the first column. All control records are
case-insensitive.
- *COMMENT any text
- The line is a comment and is ignored.
- *MESSAGE any text
- The text is displayed in the confirmation box returned to
a user who attempts to add a back link. For example, the
message might notify the sender that the document has been
superseded by a new edition, or invite the person who made the
link to E-mail the author with a description of it.
- *BLOCK
- The addition of new back links is disabled. Existing back
links can still be followed. This lets an author lock out
back links to obsolete documents or documents currently being
revised. A *MESSAGE would usually be included to
explain the reason for the blockage.
- *BACKLINKS count next_link_id
- Following this item are count lines, each containing a
back link. The first character of each back link is a space,
followed by the following items. Back links appear in
ascending order of their first word number.
next_link_id gives the next available unique link_id.
- First word number
- The starting word of the link, counting words after removal of
all HTML mark-up tags and punctuation. The first word in the
document is word 1.
- Last word number
- The last word of the link, counting words as above. A link
to a single word will have the same first and last word
numbers.
- Link_id
- A unique number, starting at 1, identifying the link.
This number is used by hlxchoose to identify
the links it wishes the user to choose among. These
numbers are never reused, even if a link is deleted
because it is found to be invalid.
- URL
- The URL of the back link destination. If the link is to
another site running Hack Links, and the link is within a
back link enabled document, this will be a CGI invocation of
hlxget with an "-t" specification pointing to the
target destination within the document. The URL is
quoted, with any embedded quotes escaped.
- Title
- The title of the target document, obtained by accessing
it via the URL at the time the link is created. This
is used by hlxchoose to identify target
documents when a link has multiple destinations. We
store the title at link creation time rather than
obtaining it when the link is followed to reduce
network traffic and the attendant delay in the
appearance of the hlxchoose results. If
changes in document titles prove to be a problem, a
utility which re-verifies all the URLs in a
.bl file and refreshes their titles could be
created and run periodically.
- *TARGETS count
- Following this item are count lines containing the
word number range of link targets within this document which
can be accessed with the "-t" option of
hlxget.
The first character of each target is a space,
followed by the following items. Targets are added as needed
to local documents when hlxmake is invoked with the
"-a" option.
- Target name
- The unique name to which the target is referred to
in remote documents. This is generated
by an algorithm similar to that used
to generate unique message identifiers for news
postings. Target names are quoted, with embedded
quotes escaped.
- First word number
- The starting word of the target, counting words after removal of
all HTML mark-up tags and punctuation. The first word in the
document is word 1. If the first word number is -1,
the target has been removed (usually because it
pointed into a section of the document which has been
deleted in a subsequent revision).
- Last word number
- The last word of the target, counting words as above. A link
to a single word will have the same first and last word
numbers.
Conclusion
Hack Links allows any Unix-based Web site that's willing to install
three public domain portable C programs in its CGI directory to
provide a back link facility that enables authors of documents to
selectively permit readers to make links in their documents to other
documents on the Web. No modifications to the Web server are
required, and back links are accessible to all existing Web browsers.
Facilities are included which allow future back-link-aware browsers to
make the process of back link creation much easier.
While the constraints imposed with the intent of making Hack Links
completely compatible with existing Web documents and software require
sacrificing the automatic revision updating contemplated in Xanadu,
implementable solutions are proposed which allow back links to be
maintained across most document revisions.
It is estimated that the software envisioned in this proposal
(excluding the automatic revision tools) could be implemented by one
person in one week.
by John Walker