linux.conf.au 2007: Wednesday 17th Jan

I think I’m experiencing some of the same lca disillusionment as I did in Adelaide, as good as Adelaide 2004 was and 2007 is. There’s still nothing like the magic of 2001, when I’d hardly heard about any of this stuff and it was still new and magical.

For the talks described below, I’ve linked video where available already, but it isn’t all encoded. They should all eventually (for values of ‘eventually’ that are something like ‘within the week’) be encoded and linked from the programme.

Andrew S. Tanenbaum’s keynote this morning was introduced by Linus Torvalds, I think possibly because this is lca and they can do that sort of thing, so why not? I liked this keynote a lot. It still wasn’t earth-shattering (perhaps Kathy Sierra on Friday will take it out with a bang?), but he threw around some interesting ideas.

The first half of the talk was a general discussion about computers and their instability. Tanenbaum claims lifetime failure counts of zero on his car, TV and stereo, and hundreds or thousands of lifetime failures of his PC. It was actually a bit unclear what was meant by lifetime failure here; program crashes? Complete system crashes? It’s also a bit too kind to the automotive industry: I do remember my father’s car computer losing the ability to change gears once, sticking him in first and leaving him there. The mechanic essentially flashed the computer. But the essential point is true.

Tanenbaum’s point from there is that cars, stereos and TVs all contain a fair bit of software these days, and, given typical industry figures of 10–70 bugs per thousand lines of code, why are they stable? Well, one answer is that they have nothing like the complexity of modern operating systems, especially those written to support thousands of potential devices. Another is that they, like mission critical software, are self-healing. Something vanishes or breaks, it gets restarted or reloaded.

This led into an introduction to Minix 3, which is, in part, an experiment in micro-kernels with self-healing layers on top. In Minix 3, if the kernel crashes, buh-bye, if the reincarnation process that’s doing the watching dies, buh-bye, and if a couple of key device drivers die you might have a hard time. But much of the rest, including a hell of a lot of functionality that would be in, say, the Linux kernel but that is in Minix user-space, can be restarted, and if you’re careful, can be done so seamlessly. The average performance hit is in the order of 6%, which is not impressive to my friends who work with extremely highly loaded network devices, but might be of interest to non-performance critical machines.

Andrew Tridgell asked about malicious or very badly written device drivers, and Tanenbaum replied that Minix 3 doesn’t protect against these. It also doesn’t protect against a cascade of failures. But Tanenbaum said that the majority of released bugs in operating systems are relatively isolated failures brought on by somewhat, but not extremely, rare corner cases. And so if you have a self-healing system that is capable of noticing a benign but buggy device driver go down and can restart it, that would suffice to survive an unspecified but presumably high number of potential crashes.

Personally, I just thought Minix 3 sounded like a good example of what the original apparently also was: a fairly small teaching example of a working operating system. I’m not predicting that I’ll become a Minix hacker, but it’s more likely than me suddenly deciding to start on Linux.

I went to Matthew Garrett on Fixing suspend for fun and profit (video, Ogg Theora, 46MB) after morning tea. There was an assortment of interesting pickings about the details of a device driver sleeping and waking up on suspend and resume, and a discussion of why suspend to disk is, sadly, usually less buggy than suspend to RAM. (When coming back from RAM, devices can be in an often underspecified and sometimes completely random state, but coming back from disk the computer has to do a normal boot as far as the boot manager goes, which puts devices into their known boot state.)

He noted a social factor in the improvements in suspend and resume: people finally have started expecting it to work. Until recently (2004 or so at the earliest), if you had a x86 Linux laptop you just assumed that it wouldn’t suspend, and therefore didn’t report bugs. A tipping point was reached, probably the point where all those damn people with IBM X40 laptops got working power management, and everyone got jealous and started whining. And so suspend and resume is much better now.

He then explained some new debugging, and how it came about. The new debugging is controlled by the CONFIG_PM_TRACE kernel compile option, and activates the TRACE_DEVICE and TRACE_RESUME macros. When activated, you can suspend and then try and resume your machine (which is presumably failing, since you’re doing this). When the resume fails, you can restart your machine and on boot it will report some information about in which bit of kernel code it last saw a TRACE_ macro (and you can put some more in to help you find the spot more exactly) and which device it was trying to wake up at the time. It stores this info in the one piece of non-volatile memory that’s really reliably addressable: the system clock. Since the clock will then contain complete garbage, conveniently the computer will not try and overwrite it, leaving you an indefinite period of time before you boot it up and grab this info. That isn’t good exactly, but you don’t have to do it all the time, just when you are trying to debug suspend and resume.

This has all been added, Garrett explained, because Linus [Torvalds] had a laptop he wanted to suspend and resume… and it didn’t work! Before that, fixing a bad driver involved a lot of voodoo, and beyond that, guesswork, about which of twenty different drivers was actually at fault.

After Garrett’s talk I went to Michael Carden on Digital Preservation – The National Archives of Australia, Open Standards and Open Source which was mostly about Xena, the National Archive’s digital preservation tool, already in testing by the University of Sydney library in conjunction with DSpace, and of course fully in production in the National Archives.

I went to Keith Packard’s X Monitor Hotplugging Sweetness at the end of the day. I didn’t really follow the whys and wherefores, which is a shame, but I think he was assuming more knowledge of the X protocol and X.org extensions than I have. (It would be impossible to assume less; I’ll never be in Daniel Stone’s posse.) The upshot is that soon, when Packard finishes up some things and merges his branch, mainline X.org X will have the ability to do things like realising that a monitor has different resolution potential and mode lines from the laptop screen, take advantage of that and then construct a giant virtual screen and be able to place the external monitor next to, above or below the laptop monitor as view ports into that virtual screen, which is something like what Xinerama does except more flexible. This will be the RandR extension, version 1.2. Things that are still to come are things like not letting the mouse escape the view ports, which is important for users, and adding support to a bunch of video card drivers (Intel is nearly there because Packard works for Intel, and Dave Airlie is on his way with Radeon) but it’s mostly there and if you like cutting edge X you can run the RandR v1.2 branch and see it in action.

I got a Professional registration as thanks for being on the programme committee. This means two things: I get my name in a big index of professional delegates, and I got to go to a networking session last night. It was a true networking session too, as in a bunch of drinks and nibblies were supplied, but very few chairs (and those probably by accident) so that people could bounce around and optimise their personal networks. I am not in an ideal position at this conference to do this (I am a computational linguistics student researcher, and this conference doesn’t contain my professional network) and also, my feet were horribly horribly sore. I chatted with some people I already know, met a few Sydneysiders I hadn’t spoken to, and went home, because sleep is important (we live an hour away from the conference by the most optimistic of public transport routes).