EGGSHELL is a suite of programs which provide access to the data set assembled by the Global Consciousness Project, facilitating development of custom software for exploration and "data mining" of this vast and rapidly growing resource. Since innumerable questions can be posed regarding the data set, the programs, while providing commonly requested analyses, focus on efficient access to and manipulation of the data, minimising the amount of custom programming required for exploratory studies. In addition, the efficiency of optimised native-mode programs permits using them as the basis for data access tools and presentation software made available to the public on a Web server. The name "EGGSHELL" denotes programs, mostly run from the UNIX shell, providing access to the data collected by the worldwide "eggs" of the Global Consciousness Project network, as well as a "wrapper" for the raw egg data files, mediating access by analysis software and handling details such as exclusion of known-bad data and calculation of common statistical measures.
All of the programs comprising EGGSHELL are written in the C++ programming language and use the Standard Template Library (STL). In order to use this software, you'll need a recent, standards-compliant C++ compiler and library; all of these programs were developed using the GNU C++ compiler (g++) version 4.0.2 on an Intel Linux system running kernel 2.6.15-1.
The programs are written in the Literate Programming paradigm using Donald Knuth's CWEB programming language. As such, they are meant to be read as well as run; the programs serve as their own documentation and are intended to be entertaining and informative as well as efficient. Consequently, this document contains only an overview of the programs and pointers so you can read them on your own. CWEB programs automatically emit ready-to-compile C++ source code and documentation in the TeX documentation language. The tools for extracting code and documentation from CWEB programs are included in the EGGSHELL distribution, but you needn't use them nor learn the CWEB language yourself to employ the toolkit in your own programs; you're perfectly free to write standard C++ and link to the extracted C++ code, using the TeX files purely as a manual.
The links from the names of the programs below are to Adobe Acrobat PDF files produced from the TeX documentation. If your browser has a PDF plug-in, they will open directly in a new browser window. If your browser lacks the requisite plug-in, they will invite you to download the PDF file to a directory on your computer whence you may read with Acrobat Reader, which is a free download from Adobe, available for virtually all popular platforms. The PDF files for the documents are included in the source distribution which you may download from the bottom of this page--if you're planning to read the programs with a stand-alone copy of Acrobat Reader, it's probably easier to get the distribution complete with all of the PDF files instead of downloading them one-by-one from the individual links.
The toolkit facilities used by the example programs are contained in the following programs, discussed in more or less decreasing order of abstraction from the raw data. The heart of the toolkit is the analysis and eggdata programs, while the remaining programs provide utilities which can be used in isolation or in conjunction with the egg database access facilities.
The toolkit is supplied as a GZIPped TAR archive which extracts into the current directory. The CWEB programs are supplied as .w files, with the C++ (.c and .h) and TeX (.tex) files already extracted. Pre-generated PDF files for the documents are also provided.
To build the toolkit and example programs, you'll need a current C++ compiler and library (I used GCC/G++ 4.0.2 to develop them). After extracting the archive, build it with:
./configure
make
If all goes well, when this process is complete, you'll have compiled object files for all of the toolkit components and ready-to-run executables for the example programs, example-1 through example-8. To run the examples, you'll need to have a copy of the Global Consciousness Project "eggsummary" and pseudorandom mirror files on your machine (or at least the ones used in the examples). The configure process automatically detects the locations of these files for the www.fourmilab.ch and noosphere.princeton.edu sites, but for other sites you'll need to add definition of the database locations to the eggdatabases class definition in eggdata.w (see the set_local_defaults method and those it calls). (Unfortunately, the current noosphere.princeton.edu server has neither g++ nor TeX installed, so it isn't possible to build or use these programs there.) When you modify a .w file, the Makefile automatically rebuilds the C++ programs and TeX documents it defines; the CWEB tools which accomplish this are included in the distribution.
If you have a complete TeX distribution loaded, you can rebuild the document for a program prog and view it in the TeX previewer with the command:
make prog.viewand update the PDF documents with:
make doc
If you're installing the analysis toolkit on a machine onto which you've copied all or part of the CSV format "eggsummary" files, you'll need to configure the directory name in which the files are kept. To permit the software to be installed on various analysts' machines, all of the examples initialise their eggdatabases by calling its:
set_local_defaults();
method The "./configure" script uses the "hostname" utility to obtain the name of the machine it's running on, and embeds this in the Makefile and configuration as a definition of the C macro HOSTNAME. If this is defined, the set_local_defaults() method is defined in eggdata.w, which tests for known hosts and sets the path names appropriately. To define a new host, add a new set_hostname_defaults to the eggdatabases class definition in eggdata.w, using the existing set_Fourmilab_defaults() and set_noosphere_defaults() as a model, then add a case for the hostname to the set_local_defaults() method immediately below it in the file. After you've tested the definitions for your host, please send me a copy of the code you added so I can incorporate into eggdata.w in the next release. That way you won't have to keep modifying the file every time a new release is posted.
If you don't want to add definitions for your host to eggdata.w, you can manually initialise the eggdatabases object with code like the following:
eggdatabases ed; ed.add_database("gcp", "/home/httpd/html/data/eggsummary"); ed.add_database("pseudo", "/home/httpd/html/data/pseudoeggsummary");
where the call with the argument of "gcp" specifies the path name of the directory containing the eggsummary files for the data taken by the egg network, and the call with "pseudo" the path for the pseudorandom mirror data generated by the GCP host. If you're only using one of the databases in your analyses, you needn't provide a path for the other. The eggsummary files in the directories may be compressed with GZIP.
The analysis software relies on two Comma-Separated Value (CSV) databases supplied with it which identify known bad data in the data set and specify physical properties of "egg" hosts in the network. These files were current as of the date the archive was posted, but it's up to you to verify that they're correct if you're analysing data collected subsequently. The files are as follows:
These files are not automatically generated nor frequently updated to reflect changes in their source documents. If you use these files, it is your responsibility to integrate any changes posted on the Global Consciousness Project site subsequent to their compilation.