« March 2005 | Main | May 2005 »
Friday, April 29, 2005
Reading List: Astérix chez les Helvètes
- Goscinny, René and Albert Uderzo. Astérix chez les Helvètes. Paris: Hachette, [1970] 2004. ISBN 2-01-210016-3.
Thursday, April 28, 2005
Reading List: We the Living
- Rand, Ayn. We the Living. New York: Signet, [1936] 1959. ISBN 0-451-18784-9.
- This is Ayn Rand's first novel, which she described to be "as near to an autobiography as I will ever write". It is a dark story of life in the Soviet Union in 1925, a year after the death of Lenin and a year before Ayn Rand's own emigration to the United States from St. Petersburg / Petrograd / Leningrad, the city in which the story is set. Originally published in 1936, this edition was revised by Rand in 1958, shortly after finishing Atlas Shrugged. Somehow, I had never gotten around to reading this novel before, and was surprised to discover that the characters were, in many ways, more complex and believable and the story less preachy than her later work. Despite the supposedly diametrically opposed societies in which they are set and the ideologies of their authors, this story and Upton Sinclair's The Jungle bear remarkable similarities and are worth reading together for an appreciation of how horribly things can go wrong in any society in which, regardless of labels, ideals, and lofty rhetoric, people do not truly own their own lives.
Monday, April 25, 2005
Reading List: The Genius of Science
- Pais, Abraham. The Genius of Science. Oxford: Oxford University Press, 2000. ISBN 0-19-850614-7.
- In this volume Abraham Pais, distinguished physicist and author of Subtle Is the Lord, the definitive scientific biography of Einstein, presents a "portrait gallery" of eminent twentieth century physicists, including Bohr, Dirac, Pauli, von Neumann, Rabi, and others. If you skip the introduction, you may be puzzled at some of the omissions: Heisenberg, Fermi, and Feynman, among others. Pais wanted to look behind the physics to the physicist, and thus restricted his biographies to scientists he personally knew; those not included simply didn't cross his career path sufficiently to permit sketching them in adequate detail. Many of the chapters were originally written for publication in other venues and revised for this book; consequently the balance of scientific and personal biography varies substantially among them, as does the length of the pieces: the chapter on Victor Weisskopf, adapted from an honorary degree presentation, is a mere two and half pages, while that on George Eugene Uhlenbeck, based on a lecture from a memorial symposium, is 33 pages long. The scientific focus is very much on quantum theory and particle physics, and the collected biographies provide an excellent view of the extent to which researchers groped in the dark before discovering phenomena which, presented in a modern textbook, seem obvious in retrospect. One wonders whether the mysteries of present-day physics will seem as straightforward a century from now.
Monday, April 18, 2005
Reading List: Les Chevaliers du Subjonctif
- Orsenna, Erik. Les Chevaliers du Subjonctif. Paris: Stock, 2004. ISBN 2-234-05698-5.
- Two years have passed since Jeanne and her brother Thomas were marooned on the enchanted island of words in La grammaire est une chanson douce. In this sequel, Jeanne takes to the air in a glider with a diminutive cartographer to map the Archipelago of Conjugation and search for her brother who has vanished. Jeanne's luck with voyages hasn't changed--the glider crashes on the Island of the Subjunctives, where Jeanne encounters its strange inhabitants, guardians of the verbs which speak of what may be, or may not--the mode of dreams and love (for what is love if not hope and doubt?), the domain of the subjunctive. To employ a subjunctive survival from old French, oft-spoken but rarely thought of as such, « Vive le subjonctif ! ». The author has been a member of the French Conseil d'État since 1985, has written more than a dozen works of fiction and nonfiction, is an accomplished sailor and president of the Centre de la mer, and was elected to l'Académie française in 1998. For additional information, visit his beautiful and creatively designed Web site, where you will find a map of the Archipelago of Conjugation and the first chapter of the book in both text and audio editions. Can you spot the perspective error made by the artist on the front cover? (Hint: the same goof occurs in the opening title sequence of Star Trek: Voyager.)
Sunday, April 17, 2005
Entropic Storm: Spring Goes Sproing!
This has been an interesting week. Friday night, April the 8th, the buried electrical cable which feeds power to the house shorted out in a coruscating blaze of vaporised copper and ozone (the latter evident throughout the house, even though the short was about ten metres as the cable runs from the entry to the power panel, and around two right angle bends). Fortunately, the building with the servers has its own independent power substation, so operation of the site wasn't affected, but the house was without power (and, since the oil burner requires electricity, heat and hot water) until Saturday morning, when we ran a "lifeline" cable from an unused 25 amp circuit on the Fourmilab power panel across the driveway to the house. In the picture to the right (click to display an enlargement), that's the grey cable snaking in from the left and disappearing down a grille into the furnace room window, whence it goes down a hallway to connect to the main bus on the power panel. The following Monday, it was time to dig up the driveway again. I say "again" because, if you follow this chronicle, you'll recall that toward the end of January the water pipe burst, requiring an excavation to dig it up and replace the failed section. Since this was done in the dead of winter the excavation, although filled in, could not be paved over until warm, dry weather arrived. As it happens, that occurred two days before the electrical cable shorted out. Now, we didn't have to re-open the water pipe hole--this one was all of two metres to the west of it! Time domain reflectometry had established that the short was about 10 metres from the entry to the house, but there were no drawings anywhere which indicated the precise route of the underground cable. The electricians brought out a gizmo which, I suppose, is kind of a Maxwellian dowsing rod, and after some very ambiguous readings about which I was highly dubious, they made a big X on the pavement and the guy with the jackhammer went at it. Amazingly (to me anyway), the electrical cable was indeed found within one metre of the X (which was in the centre of the rectangular hole at the left of the picture). Finding the cable is not the same as finding the short of course, so the excavation was continued on the arm which goes right to left until a junction box which was about half melted and half charred was discovered near the end of the branch at the right of the picture. We disconnected the cable inside the house and tried to pull the old cable out from either end to no avail whatsoever--it was put into that conduit when the house was built in 1967, and it is determined to stay there until the end of time.The most straightforward and economical solution, then, was to dig a new arm on the trench (which goes toward the top of the photo) to the exterior wall of the basement room which holds the heating oil cistern, bore a hole through that wall, and route a new cable through that room to the utility panel in the room which adjoins it. In the enlargement of the picture you can see the PVC conduit through which the cable runs into the house. At the left, the dark grey thing shaped like a skinny rugby ball is the junction box in which the new cable is spliced to the existing feed cable, which appeared to be in fine condition. I was amazed to discover that this 25 A 380 V three phase cable is spliced with crimped connections which are insulated by air--the individual wires are cut so the splices are staggered apart and cannot contact one another. But what if water gets into the junction box, you ask? Well, that's probably what brought this circus to town in the first place! Having had the water fail in January and the electricity blow up in April, I remarked to one of the neighbours, "What next? The air?" I didn't have to wait long to learn my guess was wrong. The morning we began to replace the cable, telephone service for the entire village went down! Miraculously, this was due to a failure which was not beneath my driveway, and about 36 hours later, telephone service was restored. Fortunately, mobile phones continued to work during the land line outage, and the leased line which provides Fourmilab's Internet connectivity was not affected. I don't know about you, but events like this leave me a little punchy--whenever I'm about to dig into a project which will take substantial time and concentration, I find myself looking up at the sky to see if any anvils or pianos are on incoming trajectories and muttering "What next?" over and over. After devoting most the past week to electricity and telephone problems, I figured nothing really weird was likely to happen this week-end. Oh well . . . Saturday dawned with weather more like November than April--howling winds, thick fog, and drizzle. Around sunset (not that there was any direct evidence of that event), lightning and thunder arrived in great abundance, along with the usual accompaniment of blinking lights and UPSes reporting momentary outages. Then it started to snow. As I write this, around four on Sunday morning, there's 15 cm of the stuff on the ground (including the roads), and it's continuing to come down as fast as ever. One is tempted to run outside, lift one's countenance to the sky, and shout, "It's the seventeenth of April, for heavens' sake!" While unlikely to make any difference, there may be some satisfaction in registering the protest. More snow is forecast for tomorrow. The photo above was taken at 3:06 local time on April 17th. It is a 30 second exposure at f/4 with a 12 mm lens on a Nikon D70 digital camera illuminated by moonlight scattered through the dense clouds and streetlights diffused by snow and fog. I "painted" the trees in the foreground with an LED flashlight during the exposure to highlight them.
Snow continued through the night and, as I post this update at 15:40 on the 17th, continues to fall. The following pictures were taken around 14:45 local "summer" (Hah!) time.
Friday, April 15, 2005
Reading List: The Jungle
- Sinclair, Upton. The Jungle. Tucson, AZ: See Sharp Press, [1905] 2003. ISBN 1-884365-30-2.
- A century ago, in 1905, the socialist weekly The Appeal to Reason began to run Upton Sinclair's novel The Jungle in serial form. The editors of the paper had commissioned the work, giving the author $500 to investigate the Chicago meat packing industry and conditions of its immigrant workers. After lengthy negotiations, Macmillan rejected the novel, and Sinclair took the book to Doubleday, which published it in 1906. The book became an immediate bestseller, has remained in print ever since, spurred the passage of the federal Pure Food and Drug Act in the very year of its publication, and launched Sinclair's career as the foremost American muckraker. The book edition published in 1906 was cut substantially from the original serial in The Appeal to Reason, which remained out of print until 1988 and the 2003 publication of this slightly different version based upon a subsequent serialisation in another socialist periodical. Five chapters and about one third of the text of the original edition presented here were cut in the 1906 Doubleday version, which is considered the canonical text. This volume contains an introduction written by a professor of American Literature at that august institution of higher learning, the Pittsburg State University of Pittsburg, Kansas, which inarticulately thrashes about trying to gin up a conspiracy theory behind the elisions and changes in the book edition. The only problem with this theory is, as is so often the case with postmodern analyses by Literature professors (even those who are not "anti-corporate, feminist" novelists), the facts. It's hard to make a case for "censorship", when the changes to the text were made by the author himself, who insisted over the rest of his long and hugely successful career that the changes were not significant to the message of the book. Given that The Appeal to Reason, which had funded the project, stopped running the novel two thirds of the way through due to reader complaints demanding news instead of fiction, one could argue persuasively that cutting one third was responding to reader feedback from an audience highly receptive to the subject matter. Besides, what does it mean to "censor" a work of fiction, anyway? One often encounters mentions of The Jungle which suggest those making them aren't aware it's a novel as opposed to factual reportage, which probably indicates the writer hasn't read the book, or only encountered excerpts years ago in some college course. While there's no doubt the horrors Sinclair describes are genuine, he uses the story of the protagonist, Jurgis Rudkos, as a Pilgrim's Progress to illustrate them, often with implausible coincidences and other story devices to tell the tale. Chapters 32 through the conclusion are rather jarring. What was up until that point a gritty tale of life on the streets and in the stockyards of Chicago suddenly mutates into a thinly disguised socialist polemic written in highfalutin English which would almost certainly go right past an uneducated immigrant just a few years off the boat; it reminded me of nothing so much as John Galt's speech near the end of Atlas Shrugged. It does, however, provide insight into the utopian socialism of the early 1900s which, notwithstanding many present-day treatments, was directed as much against government corruption as the depredations of big business.
Wednesday, April 13, 2005
Hello, Dubai?
One of the plots I envisioned for a science fiction story in the 1970s and early 80s was an enlightened ruler of a third world country realising that wealth was not the product of natural resources or capital, but rather the consequence of human minds set free to innovate and create what didn't exist before. This never-written story envisioned the king of a small, desperately poor, country declaring the kind of rule of law envisioned in L. Neil Smith's North American Confederacy (comic book edition), and inviting creative people from all over the world to come and do their best, without having 7/8 of the fruits of their labour looted as is the norm in the United States and the so-called European Union. "Destroy it" (the all-powerful kleptocratic state) "and they"--the creators of wealth--"will come". I imagined a small state in sub-Saharan Africa becoming the world's technological hub, with research campuses spreading across the landscape, innovative space launch services undercutting the sclerotic military-legacy systems of the Old World, and free people showing that the human mind need only be unshackled to create unimagined wealth anywhere in the world. I never wrote that story--I had a lot of other things to do in those days, and when I had the free time to scribble, it looked like Singapore had gone a long way to realising the scenario I'd envisioned. Interestingly, it may be happening again in what many people would consider one of the least likely of places, Dubai in the United Arab Emirates. The Emirates are unusual among Gulf states in that they derive only 1/3 of their GDP from fossil fuel exports and, with limited reserves, are expected to be the first to exhaust them. Dubai has been lucky enough to have two enlightened despots in a row, who have invested oil revenue in positioning the country as a world-scale business and trade hub to provide a sustainable economy after the wells run dry. They have built (and are expanding) one of the most modern airports in the world, and Emirates airlines contends with Singapore's for the most modern and rapidly growing fleet. Douglas Casey, whose 1979 book The International Man first got me thinking about getting out of the U.S., has visited Dubai and was very impressed indeed with what he saw. Certainly there is no shortage of thinking big. The world's tallest building is scheduled for completion there in 2008, and perhaps the most extravagant real estate development ever,The World Islands. And this is not a public debt funded bubble. Public debt is just 18.1% of GDP (compared to 62.4% for the U.S. and Japan's 154.6%) and, at USD20.7 billion is well covered by USD15.8 billion in foreign exchange reserves. GDP growth rate was 5.2% per annum as of 2003. Apart from the oil and banking industries, there are no taxes of any kind. An import duty of 4% is charged, but many items including food, medical supplies, and building materials are exempt, as are all transactions in the free trade zone. Constructing what amounts to a libertarian paradise (but limited to economic freedom) within a traditional Islamic monarchy located in one of the regions of the world generally considered one of the least stable and economically under-performing is, if nothing else, an interesting experiment. If it works, it may be a lesson to other developing countries, even those without the huge head start that Dubai's petroleum wealth provides.Monday, April 11, 2005
HTML/CSS: JavaScript and <textarea> fields
HTML <textarea> form fields are a convenient way for pages with embedded JavaScript calculations to return multiple line text results. For example, my Lunar Perigee and Apogee Calculator uses two <textarea> fields to display the perigee and apogee and new and full Moon tables. The JavaScript program can assemble the text it wishes to display in a string and simply assign it to the value property of the field. The way in which text is wrapped used to be controlled by the nonstandard wrap= attribute originally introduced by Netscape. In HTML 4.01, wrapping is controlled by the Cascading Style Sheet (CSS) "white-space" property, which can be applied to any element (as of CSS 2.1). I had previously used 'wrap="off"' to suppress wrapping of text, and when I made the document XHTML 1.0 compliant, I replaced this with the specification 'style="white-space: nowrap;"', which had the desired effect in Mozilla, Firefox, and Opera. Internet Explorer, however, treats this style in a <textarea> in a rather eccentric manner--it entirely ignores explicit line breaks (whether line feed or carriage return / line feed) in the string and gloms all the lines together into one monster line which appears as the top line in the box. When doing this, it doesn't display a horizontal scroll bar in the box, so unless you happen to click within the line and move the mouse off the right of the box, you'll never know the rest of the content was there. In order to persuade Explorer to honour line breaks in a string assigned to the value property of a <textarea>, you must specify 'style="white-space: pre;"', which specifies pre-formatted text handling as in an HTML <pre> area. Fortunately, this seems to work properly with the other browsers as well.Sunday, April 10, 2005
Reading List: Go Directly to Jail
- Healy, Gene, ed. Go Directly to Jail. Washington: Cato Institute, 2004. ISBN 1-930865-63-5.
- Once upon a time, when somebody in the U.S. got carried away and started blowing something out of proportion, people would chide them, "Don't make a federal case out of it." For most of U.S. history, "federal cases"--criminal prosecutions by the federal government--were a big deal because they were about big things: treason, piracy, counterfeiting, bribery of federal officials, and offences against the law of nations. With the exception of crimes committed in areas of exclusive federal jurisdiction such as the District of Columbia, Indian reservations, territories, and military bases, all other criminal matters were the concern of the states. Well, times have changed. From the 17 original federal crimes defined by Congress in 1790, the list of federal criminal offences has exploded to more than 4,000 today, occupying 27,000 pages of the U.S. Code, the vast majority added since 1960. But it's worse than that--many of these "crimes" consist of violations of federal regulations, which are promulgated by executive agencies without approval by Congress, constantly changing, often vague and conflicting, and sprawling through three hundred thousand or so pages of the Code of Federal Regulations. This creates a legal environment in which the ordinary citizen or, for that matter, even a professional expert in an area of regulation cannot know for certain what is legal and what is not. And since these are criminal penalties and prosecutors have broad discretion in charging violators, running afoul of an obscure regulation can lead not just to a fine but serious downtime at Club Fed, such as the seafood dealers facing eight years in the pen for selling lobster tails which violated no U.S. law. And don't talk back to the Eagle--a maintenance supervisor who refused to plead guilty to having a work crew bury some waste paint cans found himself indicted on 43 federal criminal counts (United States v. Carr, 880 F.2d 1550 (1989)). Stir in enforcement programs which are self-funded by the penalties and asset seizures they generate, and you have a recipe for entrepreneurial prosecution at the expense of liberty. This collection of essays is frightening look at criminalisation run amok, trampling common law principles such as protection against self-incrimination, unlawful search and seizure, and double jeopardy, plus a watering down of the rules of evidence, standard of proof, and need to prove both criminal intent (mens rea) and a criminal act (actus reus). You may also be amazed and appalled at how the traditional discretion accorded trial judges in sentencing has been replaced by what amount to a "spreadsheet of damnation" of 258 cells which, for example, ranks possession of 150 grams of crack cocaine a more serious offence than second-degree murder (p. 137). Each essay concludes with a set of suggestions as to how the trend can be turned around and something resembling the rule of law re-established, but that's not the way to bet. Once the ball of tyranny starts to roll, even in the early stage of the soft tyranny of implied intimidation, it gains momentum all by itself. I suppose we should at be glad they aren't torturing people. Oh, right. . .
Saturday, April 9, 2005
Server Farm Status
I've been writing so much about the server farm recently it's high time I showed a picture of its present state of development. (Click the image for an enlargement in a separate window.) The two boxes at the bottom are the Dell PowerEdge 1850 servers which host the site. Each has dual Intel Xeon 3.6 GHz hyper-threaded processors, which gives each server the equivalent of four CPUs. Each has 8 Gb of ECC RAM, dual 146 Gb 10,000 RPM SCSI drives on an embedded RAID controller, and two Gigabit Ethernet interfaces, which are "bonded" into a single logical interface, with each physical interface connected to one of the two 16 port Dell PowerConnect 2616 Gigabit Ethernet switches at the top of the rack. The interface to switch connections of the two servers are crossed with respect to one another. The two switches are connected together and normally forward packets to one another; each is connected to the DMZ port of one of the two redundant firewalls (which aren't in this rack, but in the communications rack upstairs). Between the servers and switches are two identical Coyote Point Equalizer 350 load balancers run in primary/backup high availability mode. The top load balancer is connected to the top switch and the bottom load balancer to the bottom switch. Hence, they exchange heartbeats through the interconnected switches, so if one switch goes down, whichever load balancer is connected to the remaining switch will become primary, and since each server has an interface connected to both switches, it will continue to be able to communicate to both servers. The rack is 24 units high, and all the components are designed to permit dense packing, but I've spaced them out both because it makes them easier to work on and remove if necessary, and also because additional breathing room can't hurt cooling. It isn't obvious from this picture, but the rack is deep--73.5 cm from front to back rails and just a tad less than one metre for the entire cabinet; the Dell servers are 1U high, but they just keep on coming in the depth dimension. The load balancers are less than 50 cm deep, so I've exploited the unused space by mounting two 15 socket outlet strips on the back rails, one behind each load balancer. These are plugged into independent APC SmartUPS 1500 units which sit on the floor behind the rack, fed from separate dedicated 10 A 230 V circuits with slow-blow thermal fuses. (Never plug a UPS or any other equipment with a big iron-core transformer into a circuit with a fast-trip breaker. The inrush current after even a momentary power blip may pop the breaker and bring your holiday to an unexpected end. This has happened to me.) The servers have dual redundant power supplies and each is plugged into both outlet strips, while the other pairs of components have one of each plugged into each strip. The UPS units are not mounted in the rack due to my earlier surreal adventure with a rack-mounted UPS. The UPS monitoring and control port of each UPS is connected to the serial port of one of the two servers; there is presently no broadcast shutdown, but since each UPS can handle the load of both servers and each server can run on one of its two power supplies, this expedient gets the job done, albeit inelegantly. The servers run the Apcupsd monitoring and control software. The load balancers, which are really FreeBSD machines, also would like to be shut down cleanly before the power goes down, but at the moment that's something which remains on my to-do list.Valley of the Dells: RAID Firmware Upgrade
Ever since the Fourmilab server farm was put into production, the only serious problem has been random outages on servers in which all subsequent writes to filesystems fail. This seems to be provoked by conditions of heavy load which generate large volumes of I/O--the first time I encountered it was before the server was put into production (and hence essentially idle) when I tried to copy a 4 Gb archive from one filesystem to another. In production, these crashes occur at random intervals anywhere from a few days to more than a week apart. It is amazing how well a Linux system will run when all filesystem writes return I/O errors! In fact, it runs sufficiently well to fool the load balancer into thinking the system is up, but not well enough to actually service HTTP, FTP, and SMTP traffic. I added a custom "server agent" the load balancer probes which not only returns an exponentially smoothed moving average of the server load, but also verifies that all server-critical filesystems are writable and reports the system down if one or more isn't. This causes a server in the "can't write to disc" failure mode to be removed from the available pool and reported down, but of course doesn't remedy the actual problem which only seems to be cured by a reboot (which will be messy, since filesystems can't be cleanly unmounted if they can't be written). When you're running with ext3 filesystems, the symptom of this failure is "journal write errors". This caused me to originally suspect the problem lay in the ext3 filesystem itself, as I noted here and here. After converting back to ext2, in a fine piece of misdirection, the problem didn't manifest itself for a full ten days, then it struck again--twice within eight hours. Since recovery from the error truncates the log file containing the messages reporting it, and usually the original messages have scrolled off the console window, I'd not actually seen the onset of the the problem until it happened right in front of my eyes yesterday. Here's what it looks like:
12:37:31: megaraid: aborting-227170 cmd=28 <c=1 t=0 l=0>
12:37:31: megaraid abort: 227170:57[255:0], fw owner
12:37:31: megaraid: aborting-227171 cmd=2a <c=1 t=0 l=0>
12:37:31: megaraid abort: 227171:62[255:0], fw owner
12:37:31: megaraid: aborting-227172 cmd=2a <c=1 t=0 l=0>
12:40:31: megaraid abort: 227172[255:0], driver owner
12:40:32: megaraid: aborting-227173 cmd=28 <c=1 t=0 l=0>
12:40:32: megaraid abort: 227173[255:0], driver owner
12:40:32: megaraid: reseting the host...
12:40:32: megaraid: 2 outstanding commands. Max wait 180 sec
12:40:32: megaraid mbox: Wait for 2 commands to complete:180
12:40:32: megaraid mbox: Wait for 2 commands to complete:175
. . . counts down to zero . . .
12:40:32: megaraid mbox: Wait for 2 commands to complete:5
12:40:32: megaraid mbox: Wait for 2 commands to complete:0
12:40:32: megaraid mbox: critical hardware error!
12:40:32: megaraid: reseting the host...
12:40:32: megaraid: hw error, cannot reset
12:40:32: megaraid: reseting the host...
12:40:32: megaraid: hw error, cannot reset
12:40:32: scsi: Device offlined - not ready after error
recovery: host 0 channel 1 id 0 lun 0
12:40:32: scsi: Device offlined - not ready after error
recovery: host 0 channel 1 id 0 lun 0
12:40:32: SCSI error : <0 1 0 0> return code = 0x6000000
12:40:32: end_request: I/O error, dev sda, sector 57443971
12:40:32: SCSI error : <0 1 0 0> return code = 0x6000000
12:40:32: end_request: I/O error, dev sda, sector 270776074
12:40:32: scsi0 (0:0): rejecting I/O to offline device
12:40:32: end_request: I/O error, dev sda, sector 130580362
12:40:32: scsi0 (0:0): rejecting I/O to offline device
12:40:32: SCSI error : <0 1 0 0> return code = 0x6000000
12:40:32: end_request: I/O error, dev sda, sector 57224203
12:40:32: Buffer I/O error on device sda6, logical block 622595
12:40:32: lost page write due to I/O error on sda6
12:40:32 server0 kernel: scsi0 (0:0): rejecting I/O to offline device
12:40:32 server0 last message repeated 14 times
12:40:32: IO error syncing ext2 inode [sda6:0004a3a2]
I have elided the date, "Apr 8", and the identification "server0:kernel" from these messages to make them fit on the page, and wrapped a couple of long ones. Note that this entire train wreck occurs within the space of about one second. It continues in this vein until you reboot. Once in this state, none of the clean shutdown alternatives work--I have to force a reboot from the Remote Access Controller or else power cycle the server.
This log information heightened my suspicion (already expressed by other Dell system administrators on various Dell/Linux discussion fora) that what we are dealing with is an inherent flaw in either the Dell PERC 4e/Si (PowerEdge RAID Controller) or the SCSI adaptor to which the discs it manages are attached. Since numerous other users of Dell PowerEdge 1850 (and its taller sibling, the 2850) reported these I/O hangs under intense load on a variety of operating systems, I was pretty confident the problem was generic to the hardware and not something particular to my machine or configuration.
Searching further based on the log entries, I came upon an announcement of a firmware update to the RAID controller, listed as "Criticality: Urgent" and described as follows:
Increased memory timing margins to address the following potential symptoms: System hangs during operating system installs or other heavy I/O, Windows Blue Screens referencing "Kernel Data Inpage Errors" or Linux Megaraid timeouts/errors.You could hardly ask for a clearer description of the symptoms than that! I downloaded the new firmware, and after some twiddling to permit the firmware upgrade to be installed from the RAM filesystem of the Fedora Core 3 Rescue CD, re-flashed the RAID controller in the backup "pathfinder" server with no problems. After it had run for about 10 hours without incident, I installed the new firmware on the front-line server as well. Will this finally fix the I/O hang under heavy load? We'll see . . . . If all goes well for a week or so, I'll convert the non-root filesystems back to ext3, which I consider exonerated (although perhaps a precipitating cause due to increased I/O traffic), but three decades of system administration counsel against changing more than one thing at a time.
Friday, April 8, 2005
Linux: Fedora X11 6.8.2-1.FC3.13 on Multiprocessor Systems
As I'm typing this, Swiss television is airing the James Bond film "A View to a Kill"--how appropriate! I recently installed the Fedora Core 3 X11 version 6.8.2-1.FC3.13 packages the on Dell PowerEdge 1850 servers which run this site. The updated X11 server and libraries are not used until you log out and back in, or reboot and log in from the console for the first time after the reboot. What happens then is not pretty; in the process of painting the desktop the mouse freezes, and a few seconds later the Orange Light of Death appears on the front panel. Inquiring with the Remote Access Controller indicates PROC_1 and PROC_2 "Status processor sensor IERR". No form of clean shutdown will work, and forcing a reboot from the RAC may result in the dismaying consequence of /etc/fstab being deleted in the fsck, forcing recovery from the Rescue CD. (Fortunately, the other server in the farm, which continued to run because I hadn't logged out and back in after installing the fatal update on it, has an identical /etc/fstab, so I was able to start the network interface and copy it over to the damaged machine with sftp. From now on I'll keep a current copy of /etc/fstab in /etc/fstab.backup for future disasters of this sort.) Apparently, there is a problem in the Fedora Core 3 X11 6.8.2-1.FC3.13 update which causes it to crash multiprocessor machines. The Fourmilab servers have dual Intel Xeon processors, each of which is "Hyper-Threaded" and hence behaves as a dual processor system itself. Reports of this problem so far on the Fedora Forum differ on whether it affects Hyper-Threaded machines with a single physical CPU. To see if you have this version installed on your system use the command "rpm -aq | grep xorg-x11". If you see lines in the output like "xorg-x11-6.8.2-1.FC3.13" then this version is installed on your machine. This problem can be particularly insidious on multiprocessor servers which are normally run in "headless" mode--with no keyboard, mouse, or monitor. Since the X server doesn't start until you log in from the console, the deadly X version can lie dormant until some other problem causes you to roll over the "crash cart" and hook up the console, whereupon your first attempt to log in will immediately hang the server, giving you another, entirely unrelated, inscrutability to unscrew, with a missing /etc/fstab "to boot" if your luck is like mine. If you have a multiprocessor machine and have this version installed, either your configuration dodges the bullet or you're lucky enough not to have logged in since you installed it. To back out the new version and revert to the previous 6.8.1-12.FC3.21 release, first make a note of all the 6.8.2-1.FC3.13 packages reported by the rpm command above. Go to the Fedora Core 3 Updates archive and download the .rpm files for the 6.8.1-12.FC3.21 version of each package. Once you've downloaded the packages, you can revert your system by running the command "rpm --oldpackage -Uvh xorg-x11-*6.8.1-12.FC3.21.i386.rpm" as super user. Before actually installing the packages, you can check whether you're missing anything or have any conflicts by running the previous command with the "--test" option. This misadventure illustrates why one should always install updates, however apparently innocuous, on a "pathfinder" system first, and then reboot it to see if anything wicked its way comes. Let the pathfinder run in production for a few days before deploying the update to the rest of your servers.Wednesday, April 6, 2005
Reading List: Old Man's War
- Scalzi, John. Old Man's War. New York: Tor, 2005. ISBN 0-7653-0940-8.
- I don't read a lot of contemporary science fiction, but the review by Glenn Reynolds and those of other bloggers he cited on Instapundit motivated me to do the almost unthinkable--buy a just-out science fiction first novel in hardback--and I'm glad I did. It's been a long time since I last devoured a three hundred page novel in less than 36 hours in three big gulps, but this is that kind of page-turner. It will inevitably be compared to Heinlein's Starship Troopers. Remarkably, it stands up well beside the work of the Master, and also explores the kinds of questions of human identity which run through much of Heinlein's later work. The story is in no way derivative, however; this is a thoroughly original work, and even more significant for being the author's first novel in print. Here's a writer to watch.
Monday, April 4, 2005
Reading List: What Went Wrong?
- Lewis, Bernard. What Went Wrong? New York: Perennial, 2002. ISBN 0-06-051605-4.
- Bernard Lewis is the preeminent Western historian of Islam and the Middle East. In his long career, he has written more than twenty volumes (the list includes those currently in print) on the subject. In this book he discusses the causes of the centuries-long decline of Islamic civilisation from a once preeminent empire and culture to the present day. The hardcover edition was in press when the September 2001 terrorist attacks took place. So thoroughly does Lewis cover the subject matter that a three page Afterword added in October 2002 suffices to discuss their causes and consequences. This is an excellent place for anybody interested in the "clash of civilisations" to discover the historical context of Islam's confrontation with modernity. Lewis writes with a wit which is so dry you can easily miss it if you aren't looking. For example, "Even when the Ottoman Turks were advancing into southeastern Europe, they were always able to buy much needed equipment for their fleets and armies from Christian European suppliers, to recruit European experts, and even to obtain financial cover from Christian European banks. What is nowadays known as 'constructive engagement' has a long history." (p. 13).