« Home Planet Version 3.3RC1 Now Available | Main | Reading List: The Evidence for the Top Quark »

Monday, August 14, 2006

Web: Truncated Downloads with Internet Explorer

Over the years, I've gotten sporadic reports from people who claim files they've downloaded from the Fourmilab server are “corrupted”. In every such case, the file on the server turned out to be just fine, racking up hundreds of downloads a day by others who reported no problems with it. Most of these reports are from users attempting to download large Zipped archives on Windows platforms which, after being downloaded, their Unzip program reported as malformed. Frequently, users who report this claim to have tried downloading the file on multiple occasions, encountering the problem every time.

Back in the dawn of time, problems like this were usually the result of somebody on a Windows machine retrieving a binary file such as a Zipped archive or self-extracting executable without setting “binary” mode in their FTP client, which would then helpfully expand every Unix line feed character into a DOS carriage return / line feed sequence, utterly wrecking binary data. But HTTP downloads of files with MIME types as common as “application/zip” shouldn't be vulnerable to such problems, so the whole thing remained a mystery.

I recently decided to see if I could get to the bottom of this. While such reports are rare (I'd say about one every other month), there's no way to estimate the actual incidence of the problem, whatever it may be, since many people may just give up and never report it. So, whenever anybody reports such a problem, I now try to identify their download attempt(s) in the HTTP server log, and send them a reply requesting a variety of information about their system and the circumstances of the error, including asking them to attach the actual “corrupted” file(s) they received. Few people reply to these messages, and often the response is sufficiently incoherent and incomplete that it's of no diagnostic use. Today, however, I got a reply which is perfect in every regard, and now I'm more mystified than ever!

The user is running on Windows 2000 with Microsoft Internet Explorer 6, both at current service pack and patch levels. No error message was reported by Internet Explorer for the download of the files, yet WinZip reported the files as invalid archives. The files the user sent me, as it turned out, were both truncated copies of the actual downloads present on my server. In each of the two files, the portion received by the user was identical to the initial part of the file on the server, but the file received was incomplete: in one case 3027188 instead of 7114310 bytes, and in the other just 192228 of 1423984 bytes on the server.

Now here's where it gets weird (cue X-Files music). The entries from the Apache HTTP server log show these downloads as having completed normally, with the full file length transferred:

200.x.x.18 - - [06/Aug/2006:14:21:49 +0200]
    "GET /homeplanet/download/3.1/hp3full.zip HTTP/1.1"
    200 7114310
    "-"
    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
200.x.x.18 - - [09/Aug/2006:16:19:41 +0200]
    "GET /homeplanet/download/3.1/hp3lite.zip HTTP/1.1"
    200 1423984
    "http://www.fourmilab.ch/homeplanet/"
    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
(I have manually wrapped these long lines onto multiple lines to avoid truncation and obscured the second and third bytes of the user's IP address in the interest of privacy.) In fact, the log showed three attempts to download the 7.1 Mb hp3full.zip file, all apparently complete and successful as seen from the server side, yet the file received by the user was incomplete.

So, the question becomes “Is there some circumstance in which Microsoft Internet Explorer appears to download an entire file from a Web server, but only writes some portion of it to the user's disc?” If this is the case, it isn't obvious there's much the operators of Web sites can do to mitigate the frustration this causes their visitors, other than making them aware of the problem and instructing them how to use the command line FTP client which comes with 32-bit Windows to download problem files. (Every file at this site is accessible with either HTTP or FTP; this is increasingly rare in these days of name virtual hosting and paranoia about even anonymous-only FTP access, so this solution, such as it is, may not work for many other sites.)

Posted at August 14, 2006 21:13