« Fedora Core 4 and the Big Wide Screen | Main | Reading List: Warped Passages »

Sunday, February 12, 2006

UNUM Version 1.1 Posted

After a week's experience using the Unum program frequently in the development of a Web project I'm not ready to talk about here quite yet, I found there were two things missing which I kept wishing were there—so now they are: behold version 1.1, now available for downloading from the link above.

The original release of Unum could only look up single Unicode characters. Version 1.1 accepts strings of any length in “c=” arguments. For example, to decode an Arabic word you might find puzzling in this document listing products banned in Saudi Arabia, you can now use:

$ unum c=باربي
  Hex        HTML    Character   Unicode
0x628     ب    "ب"         ARABIC LETTER BEH
0x627     ا    "ا"         ARABIC LETTER ALEF
0x631     ر    "ر"         ARABIC LETTER REH
0x628     ب    "ب"         ARABIC LETTER BEH
0x64A     ي    "ي"         ARABIC LETTER YEH
to figure out that this particular dire threat to the virtue of the Kingdom is Barbie! (I have deleted the octal and decimal columns from the Unum output to fit in the limited width of this page.)

You may find yourself viewing the source of a Web page and encounter a sequence of Unicode characters encoded as HTML/XHTML character entities, such as:

You can now cut and paste such gibberish directly onto a Unum command line (quoted, to be sure, so the shell isn't driven bonkers by all the ampersands, octothorpes, and semicolons), to see that this is just:
$ unum 'Путин'
  Hex        HTML    Character   Unicode
0x41F     П    "П"         CYRILLIC CAPITAL LETTER PE
0x443     у    "у"         CYRILLIC SMALL LETTER U
0x442     т    "т"         CYRILLIC SMALL LETTER TE
0x438     и    "и"         CYRILLIC SMALL LETTER I
0x43D     н    "н"         CYRILLIC SMALL LETTER EN
“Путин”, the surname of President Vladimir Putin of Russia.

Posted at February 12, 2006 14:52