« Fedora Core 4 and the Big Wide Screen | Main | Reading List: Warped Passages »
Sunday, February 12, 2006
UNUM Version 1.1 Posted
After a week's experience using the Unum program frequently in the development of a Web project I'm not ready to talk about here quite yet, I found there were two things missing which I kept wishing were there—so now they are: behold version 1.1, now available for downloading from the link above. The original release of Unum could only look up single Unicode characters. Version 1.1 accepts strings of any length in “c=” arguments. For example, to decode an Arabic word you might find puzzling in this document listing products banned in Saudi Arabia, you can now use:$ unum c=باربي Hex HTML Character Unicode 0x628 ب "ب" ARABIC LETTER BEH 0x627 ا "ا" ARABIC LETTER ALEF 0x631 ر "ر" ARABIC LETTER REH 0x628 ب "ب" ARABIC LETTER BEH 0x64A ي "ي" ARABIC LETTER YEHto figure out that this particular dire threat to the virtue of the Kingdom is Barbie! (I have deleted the octal and decimal columns from the Unum output to fit in the limited width of this page.) You may find yourself viewing the source of a Web page and encounter a sequence of Unicode characters encoded as HTML/XHTML character entities, such as:
ПутинYou can now cut and paste such gibberish directly onto a Unum command line (quoted, to be sure, so the shell isn't driven bonkers by all the ampersands, octothorpes, and semicolons), to see that this is just:
$ unum 'Путин' Hex HTML Character Unicode 0x41F П "П" CYRILLIC CAPITAL LETTER PE 0x443 у "у" CYRILLIC SMALL LETTER U 0x442 т "т" CYRILLIC SMALL LETTER TE 0x438 и "и" CYRILLIC SMALL LETTER I 0x43D н "н" CYRILLIC SMALL LETTER EN“Путин”, the surname of President Vladimir Putin of Russia.
Posted at February 12, 2006 14:52