I think I'm experiencing some of the same lca disillusionment as I did in
Adelaide, as good as Adelaide 2004 was and 2007 is. There's still nothing like
the magic of 2001, when I'd hardly heard about any of this stuff and it was
still new and magical.
For the talks described below, I've linked video where available already,
but it isn't all encoded. They should all eventually (for values of
'eventually' that are something like 'within the week') be encoded and linked
from the programme.
Andrew S. Tanenbaum's keynote this morning was introduced by Linus
Torvalds, I think possibly because this is lca and they can do that sort of
thing, so why not? I liked this keynote a lot. It still wasn't earth-shattering
(perhaps Kathy Sierra on Friday will take it out with a bang?), but he threw
around some interesting ideas.
The first half of the talk was a general discussion about computers and
their instability. Tanenbaum claims lifetime failure counts of zero on his car,
TV and stereo, and hundreds or thousands of lifetime failures of his PC. It was
actually a bit unclear what was meant by lifetime failure here; program
crashes? Complete system crashes? It's also a bit too kind to the automotive
industry: I do remember my father's car computer losing the ability to change
gears once, sticking him in first and leaving him there. The mechanic
essentially flashed the computer. But the essential point is true.
Tanenbaum's point from there is that cars, stereos and TVs all contain a
fair bit of software these days, and, given typical industry figures of 10–70
bugs per thousand lines of code, why are they stable? Well, one answer is that
they have nothing like the complexity of modern operating systems, especially
those written to support thousands of potential devices. Another is that they,
like mission critical software, are self-healing. Something vanishes or
breaks, it gets restarted or reloaded.
This led into an introduction to Minix
3, which is, in part, an experiment in micro-kernels with self-healing
layers on top. In Minix 3, if the kernel crashes, buh-bye, if the reincarnation
process that's doing the watching dies, buh-bye, and if a couple of key device
drivers die you might have a hard time. But much of the rest, including a hell
of a lot of functionality that would be in, say, the Linux kernel but that is
in Minix user-space, can be restarted, and if you're careful, can be done so
seamlessly. The average performance hit is in the order of 6%, which is not
impressive to my friends who work with extremely highly loaded network devices,
but might be of interest to non-performance critical machines.
Andrew Tridgell asked about malicious or very badly written device drivers,
and Tanenbaum replied that Minix 3 doesn't protect against these. It also
doesn't protect against a cascade of failures. But Tanenbaum said that the
majority of released bugs in operating systems are relatively isolated failures
brought on by somewhat, but not extremely, rare corner cases. And so if you
have a self-healing system that is capable of noticing a benign but buggy
device driver go down and can restart it, that would suffice to survive an
unspecified but presumably high number of potential crashes.
Personally, I just thought Minix 3 sounded like a good example of what the
original apparently also was: a fairly small teaching example of a working
operating system. I'm not predicting that I'll become a Minix hacker, but it's
more likely than me suddenly deciding to start on Linux.
I went to Matthew Garrett on Fixing suspend for fun and profit
(video,
Ogg Theora, 46MB) after morning tea. There was an assortment of
interesting pickings about the details of a device driver sleeping and waking
up on suspend and resume, and a discussion of why suspend to disk is, sadly,
usually less buggy than suspend to RAM. (When coming back from RAM, devices can
be in an often underspecified and sometimes completely random state, but coming
back from disk the computer has to do a normal boot as far as the boot manager
goes, which puts devices into their known boot state.)
He noted a social factor in the improvements in suspend and resume: people
finally have started expecting it to work. Until recently (2004 or so at the
earliest), if you had a x86 Linux laptop you just assumed that it wouldn't
suspend, and therefore didn't report bugs. A tipping point was reached,
probably the point where all those damn people with IBM X40 laptops got working
power management, and everyone got jealous and started whining. And so suspend
and resume is much better now.
He then explained some new debugging, and how it came about. The new
debugging is controlled by the CONFIG_PM_TRACE kernel compile option,
and activates the TRACE_DEVICE and TRACE_RESUME macros. When
activated, you can suspend and then try and resume your machine (which is
presumably failing, since you're doing this). When the resume fails, you can
restart your machine and on boot it will report some information about in which
bit of kernel code it last saw a TRACE_ macro (and you can put some
more in to help you find the spot more exactly) and which device it was trying
to wake up at the time. It stores this info in the one piece of non-volatile
memory that's really reliably addressable: the system clock. Since the clock
will then contain complete garbage, conveniently the computer will not try and
overwrite it, leaving you an indefinite period of time before you boot it up
and grab this info. That isn't good exactly, but you don't have to do
it all the time, just when you are trying to debug suspend and resume.
This has all been added, Garrett explained, because Linus [Torvalds] had
a laptop he wanted to suspend and resume... and it didn't work!
Before
that, fixing a bad driver involved a lot of voodoo, and beyond that, guesswork,
about which of twenty different drivers was actually at fault.
After Garrett's talk I went to Michael Carden on Digital
Preservation - The National Archives of Australia, Open Standards and Open
Source which was mostly about Xena, the National Archive's digital
preservation tool, already in testing by the University of Sydney library in
conjunction with DSpace, and of course
fully in production in the National Archives.
I went to Keith Packard's X Monitor Hotplugging Sweetness at
the end of the day. I didn't really follow the whys and wherefores, which is a
shame, but I think he was assuming more knowledge of the X protocol and X.org
extensions than I have. (It would be impossible to assume less; I'll never be
in Daniel Stone's posse.) The upshot is that soon, when Packard finishes up
some things and merges his branch, mainline X.org X will have the ability to do
things like realising that a monitor has different resolution potential and
mode lines from the laptop screen, take advantage of that and then construct a
giant virtual screen and be able to place the external monitor next to, above
or below the laptop monitor as view ports into that virtual screen, which is
something like what Xinerama does except more flexible. This will be the RandR
extension, version 1.2. Things that are still to come are things like not
letting the mouse escape the view ports, which is important for users, and
adding support to a bunch of video card drivers (Intel is nearly there because
Packard works for Intel, and Dave Airlie is on his way with Radeon) but it's
mostly there and if you like cutting edge X you can run the RandR v1.2 branch
and see it in action.
I got a Professional registration as thanks for being on the programme
committee. This means two things: I get my name in a big index of professional
delegates, and I got to go to a networking session last night. It was a true
networking session too, as in a bunch of drinks and nibblies were supplied, but
very few chairs (and those probably by accident) so that people could bounce
around and optimise their personal networks. I am not in an ideal position at
this conference to do this (I am a computational linguistics student
researcher, and this conference doesn't contain my professional network) and
also, my feet were horribly horribly sore. I chatted with some people I already
know, met a few Sydneysiders I hadn't spoken to, and went home, because sleep
is important (we live an hour away from the conference by the most optimistic
of public transport routes).