« January 13, 2009 |
Main
| January 17, 2009 »
Thursday, January 15, 2009
The Hacker's Diet Online: Unicode Character Disaster
The first major disaster since the recent
server update has struck
The Hacker's Diet Online, causing non-ASCII Unicode characters (for example, accented letters; Greek, Cyrillic, and other alphabets; and Chinese and Japanese characters) appearing in comment fields of log forms and in other contexts such as account information) to be garbled. In addition, users whose user names and/or passwords contained such characters may have had problems logging in.
It turns out (based on a preliminary analysis, performed under the gun, which may be revised as I investigate further) that the Perl function
decode_utf8 which, under Perl 5.8.5 on the old server, did precisely what its name implies: decode a string containing byte codes in UTF-8 encoding into a Unicode string, now, on Perl 5.8.8, has become “smart” (in the Microsoft sense of the word) and decides whether or not to do what you invoked it to do based upon whether the string argument it was passed was tagged as having been in UTF-8 encoding. When processing arguments to CGI scripts on Web sites, there's a bit of a problem in handling encoding: normally you want to decode UTF-8 arguments, but in the case of file uploads, the decoding must be suppressed lest binary files be corrupted. Further, arguments passed via the
QUERY-STRING from
get requests and from standard input for
post requests require different handling of encoding. Stir “smartness” on the part of a Perl function into this mess and you're asking for a disaster—which was duly delivered.
I have applied a one-line patch to The Hacker's Diet
Online which, based upon my testing so far, appears to fix the problem. Since there are so many forms in the application which accept Unicode characters and a variety of types of input (text fields, passwords, uploaded files), it will take some time to verify that everything is now working correctly. I'm pretty confident that the clamant problem of corruption of log comment fields is now corrected. Users who have had their comment fields wrecked due to this problem and do not wish to retype the affected comments should contact me via the
feedback form, and I'll try to restore any comments which have been corrupted from the daily backup tapes of the server farm.
Posted at
00:15