Table of Contents


UNDER CONSTRUCTION

Introduction

Science Fiction Prelude

We'll Return, After This Message
When a new scientific phenomenon is discovered, it's often the case that many prior observations were made without recognising their significance at the time. In this story, two programmers mount their own Search for Extraterrestrial Intelligence, not with a radio telescope, but in archives on the Internet.

Exploration

Rube Goldberg Machines and the Argument from Design
For millenia, some philosophers have argued that the self-evident perfection of the world and the organisms which inhabit it evidence conscious design--evolution blindly groping toward toward local maxima could not possible explain the development of the eye, no less the origin of life itself. This document explores the difference between complexity for its own sake, as found in Rube Goldberg machines and the products of an evolutionary process, and inherent, designed-in, complexity where, however complicated, the mechanism observed cannot be simplified without sacrificing functionality. Are we the products of authorship or editing? Neither this page nor any other in this collection advocates an answer; instead, the goal is to acquaint you with the evidence for each, providing the information upon which you can base your own judgement, and furnishing you tools for your own independent exploration of this deepest question of our common origin.

Self-Decoding Messages
How can you send a message to a being of another species, with no common language and perhaps sense organs entirely different from those of terrestrial life? Researchers involved in the Search for Extra-Terrestrial Inteligence (SETI) have thought about this problem at length and have come up with some very plausible solutions, some of which are en route to the stars as you read this. This document explores the common intellectual heritage of all species in our universe, and how it can permit us to exchange our first, halting greetings to one another.

Arecibo Decoded
Detailed decoding of an interstellar message. Spoiler warning! Don't read this before trying to decode the message in Self-Decoding Messages on your own.

The Genetic Code
The deciphering of the genetic code is one of the greatest triumphs of molecular biology. This page describes how the sequence of base pairs in a strand of DNA encodes the sequence of amino acids to assemble into a polypeptide according to the genetic code, and explains the terminology used in conjunction with genome sequences.

Storing Data in DNA
Discovery of the genetic code provided an understanding of how living organisms use DNA to store the instructions for their own manufacture. But DNA is a general purpose storage medium, just like a floppy disc or CD-ROM. Indeed, the genomes of most higher organisms contain large amounts of "junk DNA", introns, which do not appear to direct the manufacture of any protein. Might they encode something else entirely? This document explores in detail how DNA can be used as a general-purpose molecular data storage medium, and compares its density and reliability with present-day computer media.

The Genome Browser
An interactive Java application, launched from a Web page, which allows you to browse the genomes of various completely-sequenced organisms, under a variety of potential ways information may have been coded into them.

Potassium-40 and Evolution of Higher Life
Why did such a long interval elapse between the appearance of life on Earth and the emergence of complex, multicellular organisms? Perhaps it had little to do with biology and everything to do with the half-life of a radionuclide forged in the supernova whose debris gave rise to us all.

The Tree of Life
Exploration of life at the molecular level has shown it to be divided into three great domains: Archaea, Bacteria, and Eucarya, all descended from a common ancestor. This document discusses the relationship and tree of descent of life on Earth, and explains why humans are much more closely related to spinach plants than the bacteria in their own intestines.

Sizes of Various Genomes
Generally, the more complex the organism, the longer its genome: the human genome is about a thousand times longer than that of a bacterium. "That's as it should be," says a human, smugly, "I am a higher organism." Wouldn't you like to see the expression on Mr. Higher Organism's face upon learning that the Marbled Lungfish (Protopterus aethiopicus) has a genome forty-five times longer than his own? This document gives the genome length for a variety of organisms, and briefly discusses why extreme variations may occur.

Mathematical Constants in Binary
Before members of another species can decode a message, they first have to identify it as a message in the first place. Many have suggested that the most obvious way to identify an intelligent message is to include a universal mathematical constant: Pi, e, the square root of two, or a series such as the prime numbers. When transmitting a message in binary--ones and zeroes--the obvious way to encode such a number is in binary as well. This document gives these constants to a precision of more than 3200 binary digits, and contains a link which lets you download files containing these binary numbers.

Notes and Quotes

Lee Smolin on Messages in DNA
In his 1997 book, The Life of the Cosmos, theoretical physicist Lee Smolin argues that not only we, but the very universe we inhabit, may be the product of evolution. At the same time, he suggests we look within ourselves for evidence of messages, if not from our designers, perhaps merely extraterrestrial passers-by using us as biological answering machine.

Panspermia, spores, and the Bacillus subtilis genome
In the issue of Nature in which the complete genome sequence of the Gram-positive spore-forming bacterium Bacillus subtilis was reported, a News and Views item recalled Francis Crick's original suggestion that a spore might be the ideal means to spread life throughout the universe, and the first speculation, almost 30 years ago, that the genome of such an organism might contain a message from those who seeded it on Earth.

Genome Data Sets

Genome data are available in two formats. The "Raw" link downloads a ZIPped archive containing the genome sequence for the respective organism as an ASCII file, accompanied by a short text file describing the sequence. The "Compressed" link downloads a ZIPped archive containing the same sequence in the compressed (.gen file) format used by most of the programs described in these pages.

These archives contain files with long names: be sure to uncompress them with a tool which preserves the names instead of truncating them to MS-DOS FILENAME.TXT bozo form. Each sequence was the current version linked to the TIGR Microbial Database page at The Institute for Genomic Research as of September 1997. Sequences are often updated as errors are discovered and corrected; if you're interested in the most current available data, it's best to obtain it directly from the TIGR Database. These sequences are provided for convenience as a reference for the sequences used in the computer experiments presented in these pages.

Aquifex aeolicus    Raw    Compressed
Strain VF5, Bacteria, 1.50 Mb. Sequenced by Diversa, funding by Diversa.
Archaeoglobus fulgidus    Raw    Compressed
Strain VC-16, DSM4304, Archaea, 2.20 Mb. Sequenced by TIGR, funding by DOE.
Bacillus subtilis    Raw    Compressed
Strain 168, Bacteria, 4.21 Mb. Sequenced by The Bacillus subtilis Genome Sequencing Project, funded by the European Commission. This sequence corresponds to Data Release R14.2 (20th November, 1997). Kunst et.al., Nature 390: 249-256 (1997).
Borrelia burgdorferi    Raw    Compressed
Bacteria, 1.30 Mb. Sequenced by TIGR, funded by Mathers Foundation. Fraser et.al., Nature 390: 580-586 (1997)
Caenorhabditis elegans    Raw    Compressed
Eucaryote, Metazoan, 87 Mb. Sequenced by the Sanger Centre, Hinxton Hall, Cambridge and the Genome Sequencing Center at the Washington University School of Medicine, St. Louis. Science 282: 2012-2018 (1998)
Deinococcus radiodurans    Raw    Compressed
Strain R1, Bacteria, 3.00 Mb. Sequenced by TIGR, funded by DOE. Not final version.
Escherichia coli    Raw    Compressed
Strain K-12, Bacteria, 4.60 Mb. Sequenced by University of Wisconsin, funded by NHGRI. Blattner et. al., Science 277 (1997).
Haemophilus influenzae    Raw    Compressed
Strain KW20, Bacteria, 1.83 Mb. Sequenced by TIGR, funded by TIGR. Fleischmann et. al., Science 269: 496-512 (1995).
Helicobacter pylori    Raw    Compressed
Strain 26695, Bacteria, 1.66 Mb. Sequenced by TIGR, funded by TIGR. Tomb, et. al., Nature 388: 539-547 (1997).
Mycobacterium tuberculosis    Raw    Compressed
Strain H37Rv, Bacteria, 4.40 Mb. Sequenced by The Sanger Centre, funded by The Wellcome Trust. Cole et. al., Nature 393: 537-544 (1998).
Mycoplasma pneumoniae    Raw    Compressed
Bacteria, 0.81 Mb. Sequenced by University of Heidelberg. Himmelreich et. al., Nuc. Acid Res. 24: 4420-4449 (1996).
Methanobacterium thermoautotrophicum    Raw    Compressed
Strain delta H, Archaea, 1.75 Mb. Sequenced by Genome Theraputics and Ohio State University, funding by DOE. Smith et.al., J. Bacteriology, 179: 7135-7155 (1997).
Methanococcus jannaschii    Raw    Compressed
Archaea, 1.66 Mb. Sequenced by TIGR, funding by DOE. Bult et. al., Science 273: 1058-1073 (1996).
Mycoplasma genitalium    Raw    Compressed
Strain G-37, Bacteria, 0.58 Mb. Sequenced by TIGR, funded by DOE Fraser et. al., Science 270: 397-403 (1995).
Pyrococcus horikoshii    Raw    Compressed
Strain OT3, Archaea, 1.80 Mb. Sequenced by NITE, Japan.
Saccharomyces cerevisiae    Raw    Compressed
Eucaryote, 13 Mb. Sequenced by European and North American consortium. Goffeau et. al., Nature 387: (Suppl.) 5-105 (1997).
Synechocystis sp.    Raw    Compressed
Strain PCC 6803, Bacteria, 3.57 Mb. Sequenced by Kazusa DNA Research Institute. Kaneko et. al., DNA Res. 3: 109-136 (1996).
Treponema pallidum    Raw    Compressed
Subspecies pallidum Nichols, Bacteria, 1.14 Mb. Sequenced by TIGR and the University of Texas - Houston, funded by NIAID. January 1998 provisional sequence.

Encoded Message Data Sets

The following data sets are provided in the same formats as the genome data described above, but are not the genomes of actual organisms (although portions may be genuine sequences). These artificial sequences contain information encoded in various ways, and are provided to illustrate how DNA can be used as a data storage and transmission medium, and as a challenge for those interested in finding encoded messages.

Swiss Flag Example    Raw    Compressed
A small image of the Swiss flag coded into a fragment of a genome as explained in the worked example in the Storing Data in DNA document.

Xenobacterium factitius    Raw    Compressed
This is the genome of a genuine but unspecified organism to which one or more encoded messages have been added, then the messages and contiguous regions of the source genome randomly shuffled to prevent trivial discovery of the organism and extraction of the messages. How many of the encoded messages can you find? Note that there's no guarantee all the messages, however obvious in retrospect, can be found with tools available on this site--you may have to download the sequence and write your own programs to find them--or maybe not...still, you're way ahead of a radio astronomer searching for extraterrestrial signals in the electromagnetic spectrum: I've told you this sequence contains at least one message, and you don't have to search billions of channels and stars to find the message in the first place--a single click on the links above will suffice to download it to your computer. After you've found as many messages as you can, vist The Curious Case of Xenobacterium factitius where all its secrets are disclosed, and you can download all the materials used to produce the genome sequence of this fantasy organism. To avoid inadvertently spoiling the fun of chasing down the messages in this sequence, the page which gives the answers on a platter is password protected. To access it, enter a User name of Xenobacterium and a password of factitius. This not only requires a modicum of effort which will deter the truly lazy, it blocks indexing of the answers by search engines which would otherwise hand them out to anybody who entered a matching keyword.

Musical Accompaniment

Johann Sebastian Bach's Goldberg Variations (BWV 988)

Science Fiction Coda

Flying Saucers Explained
Several lines of inquiry suggest that a physical theory which unifies the seemingly incompatible worlds of General Relativity and Quantum Mechanics may also be a prerequisite for the complete understanding of life and consciousness. Taking this already extremely speculative notion as the point of departure, this document explores how many seemingly enigmatic aspects of the UFO phenomenon fall neatly into place and, at the same time, explains several curious aspects of the origin of life on Earth.