Table of Contents
Science Fiction Prelude
- We'll Return, After This Message
- When a new scientific phenomenon is discovered, it's often
the case that many prior observations were made without
recognising their significance at the time. In this story,
two programmers mount their own Search for Extraterrestrial
Intelligence, not with a radio telescope, but in archives on
the Internet.
Exploration
- Rube Goldberg Machines and the Argument from Design
- For millenia, some philosophers have argued that the self-evident
perfection of the world and the organisms which inhabit it
evidence conscious design--evolution blindly groping toward
toward local maxima could not possible explain the development
of the eye, no less the origin of life itself. This document
explores the difference between complexity for its own sake,
as found in Rube Goldberg machines and the products of an
evolutionary process, and inherent, designed-in, complexity
where, however complicated, the mechanism observed cannot be
simplified without sacrificing functionality. Are we the
products of authorship or editing? Neither this page nor any
other in this collection advocates an answer; instead, the
goal is to acquaint you with the evidence for each, providing
the information upon which you can base your own judgement,
and furnishing you tools for your own independent exploration
of this deepest question of our common origin.
- Self-Decoding Messages
- How can you send a message to a being of another
species, with no common language and perhaps sense organs
entirely different from those of terrestrial life?
Researchers involved in the Search for Extra-Terrestrial
Inteligence (SETI) have thought about this problem at length
and have come up with some very plausible solutions, some of
which are en route to the stars as you read this. This document
explores the common intellectual heritage of all species in
our universe, and how it can permit us to exchange our first,
halting greetings to one another.
- Arecibo Decoded
- Detailed decoding of an interstellar message.
Spoiler warning!
Don't read this before trying to decode the message in
Self-Decoding Messages
on your own.
- The Genetic Code
- The deciphering of the genetic code is one of the greatest
triumphs of molecular biology. This page describes how the
sequence of base pairs in a strand of DNA encodes the sequence of
amino acids to assemble into a polypeptide according to the
genetic code, and explains the terminology used in
conjunction with genome sequences.
- Storing Data in DNA
- Discovery of the genetic code provided an understanding
of how living organisms use DNA to store the instructions for
their own manufacture. But DNA is a general purpose
storage medium, just like a floppy disc or CD-ROM. Indeed, the
genomes of most higher organisms contain large amounts of
"junk DNA", introns, which do not appear to direct
the manufacture of any protein. Might they encode
something else entirely? This document explores in detail how
DNA can be used as a general-purpose molecular data storage
medium, and compares its density and reliability with
present-day computer media.
- The Genome Browser
- An interactive Java application, launched from a Web
page, which allows you to browse the genomes of various
completely-sequenced organisms, under a variety of
potential ways information may have been coded into them.
- Potassium-40 and Evolution of Higher Life
- Why did such a long interval elapse between the appearance
of life on Earth and the emergence of complex, multicellular
organisms? Perhaps it had little to do with biology and
everything to do with the half-life of a radionuclide
forged in the supernova whose debris gave rise to us
all.
- The Tree of Life
- Exploration of life at the molecular level has shown
it to be divided into three great domains: Archaea,
Bacteria, and Eucarya, all descended from a common
ancestor. This document discusses the relationship
and tree of descent of life on Earth, and explains why
humans are much more closely related to spinach plants
than the bacteria in their own intestines.
- Sizes of Various Genomes
- Generally, the more complex the organism, the longer its
genome: the human genome is about a thousand times
longer than that of a bacterium. "That's as it
should be," says a human, smugly, "I am a higher
organism." Wouldn't you like to see the expression
on Mr. Higher Organism's face upon learning that
the Marbled Lungfish (Protopterus aethiopicus)
has a genome forty-five times longer than his own?
This document gives the genome length for a
variety of organisms, and briefly discusses why
extreme variations may occur.
- Mathematical Constants in Binary
- Before members of another species can
decode a message, they first have
to identify it as a message in the first place. Many have
suggested that the most obvious way to identify an intelligent
message is to include a universal mathematical constant:
Pi, e, the square root of two, or a series such
as the prime numbers. When transmitting a message in binary--ones
and zeroes--the obvious way to encode such a number is in
binary as well. This document gives these constants to a precision
of more than 3200 binary digits, and contains a link which
lets you download files containing these binary numbers.
Notes and Quotes
- Lee Smolin on Messages in DNA
- In his 1997 book, The Life of the Cosmos,
theoretical physicist Lee Smolin argues that
not only we, but the very universe we inhabit, may be
the product of evolution. At the same time, he suggests
we look within ourselves for evidence of messages, if not from
our designers, perhaps merely extraterrestrial
passers-by using us as biological
answering machine.
- Panspermia, spores, and the Bacillus subtilis genome
- In the issue of Nature in which
the complete genome sequence of the Gram-positive spore-forming bacterium
Bacillus subtilis was reported, a News and Views item
recalled Francis Crick's original suggestion that a spore might be the
ideal means to spread life throughout the universe, and the first
speculation, almost 30 years ago, that the genome of such an organism might
contain a message from those who seeded it on Earth.
Genome data are available in two formats. The "Raw" link
downloads a
ZIPped archive containing
the genome sequence for the respective organism as an ASCII file,
accompanied by a short text file describing the sequence.
The "Compressed" link downloads a ZIPped archive containing
the same sequence in the compressed (.gen file) format
used by most of the programs described in these pages.
These archives contain files with long names: be sure to uncompress
them with a tool which preserves the names instead of truncating
them to MS-DOS FILENAME.TXT bozo form.
Each sequence was the current version linked to the
TIGR Microbial Database
page at The Institute for Genomic
Research as of September 1997. Sequences are often updated
as errors are discovered and corrected; if you're interested in
the most current available data, it's best to obtain it directly from
the TIGR Database.
These sequences are provided for convenience as a reference for
the sequences used in the computer experiments presented in
these pages.
- Aquifex aeolicus
Raw
Compressed
-
Strain VF5, Bacteria, 1.50 Mb.
Sequenced by Diversa, funding by Diversa.
- Archaeoglobus fulgidus
Raw
Compressed
-
Strain VC-16, DSM4304, Archaea, 2.20 Mb.
Sequenced by TIGR, funding by DOE.
- Bacillus subtilis
Raw
Compressed
-
Strain 168, Bacteria, 4.21 Mb.
Sequenced by The
Bacillus subtilis Genome Sequencing Project, funded by
the European Commission. This sequence corresponds to
Data Release R14.2 (20th November, 1997).
Kunst et.al., Nature 390: 249-256 (1997).
- Borrelia burgdorferi
Raw
Compressed
-
Bacteria, 1.30 Mb.
Sequenced by TIGR, funded by Mathers Foundation.
Fraser et.al., Nature 390: 580-586 (1997)
- Caenorhabditis elegans
Raw
Compressed
-
Eucaryote, Metazoan, 87 Mb.
Sequenced by the Sanger Centre, Hinxton
Hall, Cambridge and the
Genome Sequencing Center
at the Washington University
School of Medicine, St. Louis.
Science 282: 2012-2018 (1998)
- Deinococcus radiodurans
Raw
Compressed
-
Strain R1, Bacteria, 3.00 Mb.
Sequenced by TIGR, funded by DOE.
Not final version.
- Escherichia coli
Raw
Compressed
-
Strain K-12, Bacteria, 4.60 Mb.
Sequenced by University of Wisconsin, funded by NHGRI.
Blattner et. al., Science 277 (1997).
- Haemophilus influenzae
Raw
Compressed
-
Strain KW20, Bacteria, 1.83 Mb.
Sequenced by TIGR, funded by TIGR.
Fleischmann et. al., Science 269: 496-512 (1995).
- Helicobacter pylori
Raw
Compressed
-
Strain 26695, Bacteria, 1.66 Mb.
Sequenced by TIGR, funded by TIGR.
Tomb, et. al., Nature 388: 539-547 (1997).
- Mycobacterium tuberculosis
Raw
Compressed
-
Strain H37Rv, Bacteria, 4.40 Mb.
Sequenced by The Sanger Centre,
funded by The Wellcome Trust.
Cole et. al., Nature 393: 537-544 (1998).
- Mycoplasma pneumoniae
Raw
Compressed
-
Bacteria, 0.81 Mb.
Sequenced by University of Heidelberg.
Himmelreich et. al., Nuc. Acid Res. 24: 4420-4449 (1996).
- Methanobacterium thermoautotrophicum
Raw
Compressed
-
Strain delta H, Archaea, 1.75 Mb.
Sequenced by Genome Theraputics
and Ohio State University,
funding by DOE.
Smith et.al., J. Bacteriology, 179: 7135-7155 (1997).
- Methanococcus jannaschii
Raw
Compressed
-
Archaea, 1.66 Mb.
Sequenced by TIGR, funding by DOE.
Bult et. al., Science 273: 1058-1073 (1996).
- Mycoplasma genitalium
Raw
Compressed
-
Strain G-37, Bacteria, 0.58 Mb.
Sequenced by TIGR, funded by DOE
Fraser et. al., Science 270: 397-403 (1995).
- Pyrococcus horikoshii
Raw
Compressed
-
Strain OT3, Archaea, 1.80 Mb.
Sequenced by NITE,
Japan.
- Saccharomyces cerevisiae
Raw
Compressed
-
Eucaryote, 13 Mb.
Sequenced by European and North American consortium.
Goffeau et. al.,
Nature 387: (Suppl.) 5-105 (1997).
- Synechocystis sp.
Raw
Compressed
-
Strain PCC 6803, Bacteria, 3.57 Mb.
Sequenced by
Kazusa DNA
Research Institute.
Kaneko et. al., DNA Res. 3: 109-136 (1996).
- Treponema pallidum
Raw
Compressed
-
Subspecies pallidum Nichols, Bacteria, 1.14 Mb.
Sequenced by TIGR and the
University of Texas - Houston,
funded by NIAID.
January 1998 provisional sequence.
The following data sets are provided in the same formats as
the genome data described above, but are not the
genomes of actual organisms (although portions may be genuine
sequences). These artificial sequences contain information
encoded in various ways, and are provided to illustrate how
DNA can be used as a data storage and transmission medium,
and as a challenge for those interested in finding encoded
messages.
- Swiss Flag Example
Raw
Compressed
-
A small image of the Swiss flag coded into a fragment of
a genome as explained in the worked
example in the
Storing Data in DNA
document.
- Xenobacterium factitius
Raw
Compressed
-
This is the genome of a genuine but unspecified organism to
which one or more encoded messages have been added, then the
messages and contiguous regions of the source genome randomly
shuffled to prevent trivial discovery of the organism and extraction
of the messages. How many of the encoded messages can you
find? Note that there's no guarantee all the messages, however
obvious in retrospect, can be found with tools available
on this site--you may have to download the sequence and
write your own programs to find them--or maybe not...still,
you're way ahead of a radio astronomer searching for
extraterrestrial signals in the electromagnetic spectrum:
I've told you this sequence contains at least one message,
and you don't have to search billions of channels and
stars to find the message in the first place--a single click
on the links above will suffice to download it to your
computer.
After you've found as many messages as you can, vist
The Curious Case of Xenobacterium factitius
where all its secrets are disclosed, and you can download all
the materials used to produce the genome sequence of this
fantasy organism. To avoid inadvertently spoiling the fun
of chasing down the messages in this sequence, the page
which gives the answers on a platter is password protected.
To access it, enter a User name of Xenobacterium
and a password of factitius. This not only
requires a modicum of effort which will deter the truly lazy,
it blocks indexing of the answers by search engines which
would otherwise hand them out to anybody who entered a matching
keyword.
Musical Accompaniment
- Johann Sebastian Bach's Goldberg
Variations (BWV 988)
-
Science Fiction Coda
- Flying Saucers Explained
- Several lines of inquiry suggest that a physical theory which
unifies the seemingly incompatible worlds of General
Relativity and Quantum Mechanics may also be a prerequisite
for the complete understanding of life and consciousness.
Taking this already extremely speculative notion as the point
of departure, this document explores how many seemingly
enigmatic aspects of the UFO phenomenon fall neatly into place
and, at the same time, explains several curious aspects of the
origin of life on Earth.