CODEGROUP: Five-Letter Codegroup Filter

Five-Letter Codegroup Filter

This page describes a program, codegroup, which encodes and decodes arbitrary binary data in five-letter code groups, just like spies use.

NAME

codegroup – encode / decode binary file as five letter codegroups

SYNOPSIS

codegroup -d|-e [ -u ] [ infile [ outfile ] ]

DESCRIPTION

For decades, spies have written their encoded messages in groups of five letters.

codegroup encodes any file into this form, allowing it to be transmitted through any medium, and decodes files containing codegroups into the original input. Encoded files contain a 16-bit cyclical redundancy check (CRC) and file size to verify, when decoded, that the message is complete and correct. Files being decoded may contain other information before and after the codegroups, allowing in-the-clear annotations to be included.

codegroup makes no attempt, on its own, to prevent your message from being read. Cryptographic security should be delegated to a package intended for that purpose, such as pgp. codegroup can then be applied to the encrypted binary output, transforming it into easily transmitted text. Text created by codegroup uses only upper case ASCII letters and spaces. Unlike files encoded with uuencode or pgp's “ASCII armour” facility, the output of codegroup can be easily (albeit tediously) read over the telephone, broadcast by shortwave radio to agents in the field, or sent by telegram, telex, or Morse code.

To illustrate the difference, here are the first few lines of a binary file encoded by:

base64:: H4sICFJ9MzYAA2EudGFyAOxba3faSNKer+lf0SezO3YmgLnY2I6TyQIGgwOGBTtOYjuJEMJo DJJGF1+ys//9rarulpqLHRi/mdk9G84JIKGuqq579eNkNn745q9sNru9tcXhs5gtFPAzm83l xad88WyxmNssbhe3sps8m8ttZ/M/8K1vL9oPP0RBaPggypU1vrad+59zosj0HqAj9xF//pe8 WsaVNbTH1rfkAfoobm7ea//cZn4rtv/mNtq/kM9t/cCz31Io9foftz9nnW77oMdfcdMdWJe+
uuencode:: begin 644 data.bin M'XL("&7._R VUO;V /9U+FN2XSF3G6H5OA1(?HOB<=/<7__X7TN<PJ[L& M=?-&1;I+) B8 0;P?_Z'?WY_-=7Q"T_JSZ_6)X9?&"$OU9[N'A[A%^L^6= M?^M[OOV+:9=UM9J^] MAS_ ;X0O]U];(Z?<WWE9_[/]ZMMOO[CG'^2MM M_G(+,US/LWKZE1#C^YO?D_;O#G[7][2R^+0>XJ^&PI/[?7-7U]KU=]SSWQ?
pgp:: -----BEGIN PGP MESSAGE----- Version: 2.6.2i hIwCCb8iTku3pBUBA/9oSDlfk/On9bwjmTnB98Eejr6agkPSi3n6hd8JkAtJd33f kzFq18Jo0xzRUWZ7Di6Jq/FXpeI1yztVDqispbcYOP0aDv4JZOSF1kRsmJ9xK9Bo Cv4a967IXPkkRsjIAkx0B39dYxCzf8kHUn4THmyV/b2qLUZ0cc+mr8hxFfFpuYSM
codegroup:: ZZZZZ YBPIL AIAIG FMOPP CPAAA DGNGP GPGPA ADNJN ELJKO ELIMO GEOHF KIFGP IFBCB PKCPI YJMHE PHBHP PPOBH NCOHD AKLLL AGHFP DEGEF LKELC EAIJI ABAGP AHPPO IHHPH OHPDF YNFPB ALEPO KMPKP NGCHI GFPBI CBDML PFGHL LIHPC BOOBB HOLDO FJNHP OLHLL OPNIL

Only codegroup conforms to the telegraphic convention of all upper case letters, and passes the “telephone test” of being readable without any modifiers such as “capital” and “lower-case”. Avoiding punctuation marks and lower case letters makes the output of codegroup much easier to transmit over a voice or traditional telegraphic link.

OPTIONS

-d: Decodes the input, previously created by codegroup, to recover the original input file, and verifies it to detect truncation or corruption of the contents.
-e: Encodes the input into codegroups.
-u: Print how-to-call information.

APPLICATION NOTES

Encoding a binary file as ASCII characters inevitably increases its size. When used in conjunction with existing compression and encryption tools, the resulting growth in file size is usually acceptable. For example, a random extract of electronic mail 32768 bytes in length was chosen as a test sample. Compression with gzip compacted the file to 15062 bytes. It was then encrypted for transmission to a single recipient with pgp, which resulted in a 15233 byte file. (Even though pgp has its own compression, smaller files usually result from initial compression with gzip. In this case, pgp alone would have produced a file of 15420 bytes.)

codegroup transforms the encrypted file into a 37296 byte text file. Thus, due to compression, the code groups for the encrypted file are only a little larger than the original cleartext.

Restricting the character set to upper case letters and including spaces between groups results in substantially larger output files than those produced by uuencode and pgp. Files encoded with codegroup are about 2.5 times the size of the input file, while uuencode and pgp expand the file only about 35%. codegroup is thus preferable only for applications where its limited character set is an advantage.

FILES

If no infile is specified or infile is a single “-”, codegroup reads from standard input; if no outfile is given, or outfile is a single “-”, output is sent to standard output. Input and output are processed strictly serially; consequently codegroup may be used in pipelines.

BUGS

When a CRC error is detected, no indication is given of the location in the file where the error(s) occurred. When sending large files, you may want to break them into pieces with the splits utility so, in case of error, only the erroneous pieces need to be re-sent.

It might be nice to embed the original file name and modes in the encoded output, but this opens the door to all kinds of system-dependent problems. You can always include this information as text before the first codegroup, or send an archive created with tar or zip.

Download codegroup.zip (Zipped archive)

The program is provided as codegroup.zip, a Zipped archive containing an ready-to-run WIN32 command-line executable program, codegrp.exe (compiled using Microsoft Visual C++ 5.0), and in source code form along with a Makefile to build the program under Unix.

EXIT STATUS

codegroup returns status 0 if processing was completed without errors, 1 if errors were detected in decoding a file which indicate the output is incorrect or incomplete, and 2 if processing could not be performed at all due, for example, to a nonexistent input file or no codegroups found in the input.

COPYING

This software is in the public domain. Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, without any conditions or restrictions. This software is provided “as is” without express or implied warranty.

The NATO Phonetic Alphabet (In Your Palm)

Fourmilab Home Page

by John Walker
September 7th, 2008