Bugs

I’m averaging about two bug reports every time I try and use Nautilus at the moment: in particular, I have not ever successfully gotten it to talk ssh to a remote server for more than about a minute. I get a lot of ‘seems perfectly stable to me’ replies from upstream, and a lot of ‘yeah, doesn’t everyone know that Nautilus/gmone-vfs don’t really do SSH?’ from everyone else.

But it doesn’t matter anyway, because apparently every time I fill in one of those helpful "your application has crashed, inform developers?" dialog boxes, the bug reporting tool just eats the report. And this is a good thing, because it saves them from ‘otherwise useless’ bug reports.

So I think I’ll save my breath to cool my porridge from now on. In particular, the next time I see someone talking about how Nautilus is amazing if only all these CLI bound nerds would consider using the mouse, I’ll have a quiet laugh to myself and continue using scp, rsync and unison, which I have never ever ever seen crash and file an otherwise useless bug report in my head.

Wednesday 10 November 2004

Using my previous entry as a jumping off point, I’m here to tell you a long and convoluted story about why, as a bug reporter (among other things), I will with but the flimsiest justification tend to regard myself as either put-upon or an idiot. In the spirit of Mark Pilgrim’s morons and assholes division, I bring you two user archetypes: the Thug and the Wimp.

I’d present Thea the Thug to you, but I think you’ve already met her. She emailed you one time while you were sleeping. The subject of the email was ‘PLEASE HELP — URGENT’. It’s possible the body described a genuine bug in your project. It’s just as possible that she was asking something unanswerable like "how can I get a list of email addresses?" It doesn’t matter anyway, because an hour later she sent another mail or two denouncing you for not answering her first. She’s got a range of tactics. She might have claimed to be representing the silent majority of users against the tyranny of the developers. She might have asked you how you expect to make your operating system work for ordinary people if you don’t even answer their emails. She might have just randomly continued asserting vague things about how your program should work, or about design principles, or security. She feels very put upon that you didn’t reply, especially since she’s smarter than you, and also she’s doing you a favour by using your program. You wouldn’t hurt a user now would you?

In fact, you’ve met Thea and other Thugs many times. On mailing lists. In bug reports. On IRC. If you’re particularly unlucky, in the street. You know her and her ilk backwards and forwards. One day, or many days, you snapped, and wrote anything from a blog entry to a howto lambasting users for their ill-placed sense of entitlement, their rudeness, and their over-use of capital letters. In some cases you edited it down into a neutral piece on how to get replies on mailing lists. In other cases, you posted the rant.

Now, meet Willy the Wimp. Willy’s just read your piece on how to get replies on mailing lists and he thinks you’re talking to him. And what’s more, you’re saying the same thing that everyone says to him. Well, you’re mainly saying what he says to himself. Bug reports, he reads in your piece, are hard to get right. If you get them wrong, which you probably will since they’re hard, you will annoy very smart people who have a lot of demands on their time. And if he doesn’t get it right he will look like an idiot. And you’ll never see him again, because he won’t do the report now.

You’ve probably met Willy too, but a bit less often. In their most common variety, Wimps are your granddad, the one who’s just bought a very expensive computer in the fear that he’s getting behind. And further, his next door neighbour has a lot of computer trouble. One time a letter to her niece just disappeared. And she was doing what she always did, she swears! And sure enough, when Granddad Willy uses the computer, he finds out that his very expensive new possession moves things around (possibly because his mousing skills aren’t good, but he doesn’t know that). They are tricksy. They are uncanny. They are not for the likes of him.

However, that’s only one variety of Wimp, the archetypal archetype if you will. And he’s probably not reading your piece. But there’s a Wimp who might be.

This new Wimp is a contradiction in terms. This Wimp actually uses computers a lot. He not only holds a computer science degree, he holds one with quite good results. He has a computer related job. He’s got geeky interests. And he has a dirty little secret: when the computer doesn’t work it’s his fault. It could walk like a bug, talk like a bug, quack like a bug, and carry a big "Hello I’m Buggy the Buggy Bug!" sign, and his first thought on meeting it would be "I suck at computers. I’m a dumb person."

It’s quite likely this Wimp can program even, although three things are likely to be true. The first is that it isn’t his hobby, because if you think other people’s programs prove you’re a dumdum, your own programs square the effect. The second is that he learnt to program only at university or on the job. The third is that he’s spent a lot of time with people who are better programmers than he is.

These Wimps are shy. They’re often in the closet, confessing their hopeless incompetence only to their bewildered partners (who generally perceive them as perfectly competent and often try and explain this), who are the only people you can say things like "but I’m stupid! It’s my fault!" to more than once (and not too many more times than once, mind you, either).

Usability testers are the general exception. They see Wimps all the time. They have tapes of them crying and saying things like "Please, don’t worry about me. Your program is great. I’m just dumb. I’m so sorry." (Really, they do.) Every so often, they write textbooks and try and jerk Wimps into the light by saying things like "users will tend to blame software failures on lack of skills on their part," but their gentle warnings scatter like ashes in the face of the Thugs who blame software failures on the world’s failure to listen to their prophetic tones.

Even so Wimpishness is more common than you think. Academia, for example, is absolutely infested with people who are just waiting for someone to knock and say "Oh hello, did I mention? We’ve noticed that you’re the dumbest person in the department. What on earth did you think you were doing? Pack up your stuff and be gone by noon." And there are an awful lot of computer users who think exactly like this, all the time.

Which is not to say that Wimps are great people, mind you. There are some places where the cultural norm is to lead with "I, a lowly worm, grovel at the feet of my betters and humbly present them with this small critique of their program and beg them to use the whip sparingly," but in most of the circles I move in this kind of thing annoys people and it’s seen as a roundabout way of asking for compliments in the reply and an especially tacky one at that. (And let’s face it, compliments are exactly what the Wimps want, again and again and again. Well that, and for their computers to work.)

No, the trouble with Wimps is that it’s almost impossible to speak softly to them and fend off the Thugs at the same time. I understand this. I really don’t expect people to reply "yes, this is definitely a bug, don’t worry, I think you’re very smart" to bugs I file. I really don’t. But at the same time, every time the program crashes, I think "I’m a dumb person, what am I?" automatically. And in situations like that, hearing about how you don’t want ‘useless’ bug reports is so easy for me to translate into ‘your bug reports are useless’.

OK, so I just have to buck up. I know this. (When I forget, I ask my Dad.) But in the general case, this is a real dilemma. When you write something, or say something, and have a Thug in mind as your audience, or even yourself the perfectly competent user who can flame with the best of them but only when there’s need, there are these people in the background, the ones who are thinking "I’m a dumb person, what am I?" and hearing you say that too.

I would love to know why it is that computers in particular inspire this reaction in people, especially people who’ve had all kinds of external validation about their abilities, and who are being presented with a crash dialog that reads "the program has quit unexpectedly," as opposed to "you broke it, moron!" Maybe I’m just an outsider, but it seems to me that this is something more pronounced in computing (and academia) than elsewhere. How do people’s feedback loops get so skewed? Is there a magical time to grab a Wimp and give them the good swift kick in the bum that you secretly want to give them? If not, what’s the best way to improve the feedback quality?

Planet Free Software

Article originally posted at the IT Kitchen, a now defunct project founded by Shelley Powers.

Free Software developers, who had strong mailing list and IRC based online communities before the advent of weblogging, have nevertheless found their way into it. This post is a summary of how the Free Software world is using blogs for collaboration; largely preferring aggregation of community members’ blogs over setting up single access group blogs, and using them as a community building tool rather than a software development collaboration tool.

One of the big developments was Advogato, which started in late 1999. The creator of Advogato, Raph Levien, appears to have been trying to start up a kind of a semi-formal guild system for Free Software developers, allowing them to rank each other as Master, Journeyer or Apprentice. As a small feature, he added the ability for users to make “diary entries”, the most recent of which were listed at the side of the front page.

While the other features of advogato proved only an intermittant success — the quality of the articles on the front page is widely lamented, and the certification system has been subject to a lot of debate and has not resulted in the development of formal mentoring — the diary feature was a smash hit. Waves of Free Software developers hit advogato in 2000 and 2001 as they started reading their co-developers’ diaries. The buzz even generated a Salon article in mid 2000.

The initial buzz surrounding Advogato occasionally caused users to publicly renounce their former bad opinions of “online journals”: rather than being ‘useless’ things full of stories about children and cats, they were a new space to talk about your code and find out more about your fellow developers. Advogato was known as a friendly place, in contrast sometimes with the development mailing lists themselves.

Eventually the worlds of Advogato and of blogs began to meet. In mid-2002 Levien was discovering the wider blogosphere and started exploring using his Advogato diary as a primary means of communication with other interesting people. By that time RSS feeds of individual entries and of the entire recent diary entries page were probably the single most requested feature: people no longer wanted to drop in on the site to skim through the new entries, they wanted to poll them like they were beginning to do with other websites. (RSS feeds of individuals’ diaries were added in April 2003.)

At around about this time also, some people started to express serious dissatisfaction with the Advogato community as political debates became more common and the community attracted a few diary trolls. Levien added a diary rating feature as requests to be able to keep some users off the recent entries page grew. Others used the Advogato article feature to deplore the decline in the community.

As various blogging tools became more popular around this time, it became increasingly common to see diary entries from an Advogato regular announcing that their diary was moving elsewhere.

As RSS feeds became fairly ubiquitous, the Free Software community started to revert to a more typical blogging community model: you read blogs of people whose names you knew, and you found other people you knew or knew of through sidebars and comments.

However, in mid-2003 Jeff Waugh of the GNOME desktop project decided to create his own version of the Advogato front page, a HTML page with recent blog entries from GNOME developers all over the web (including several on Advogato). He used the Spycroll aggregator software to pull in RSS feeds, and he made them all available on a single webpage, with the cute addition of disembodied "hackergotchi heads" personalising each name.

He was stunned with the popularity of the page he linked from his own sidebar as Planet GNOME and started to field all kinds of questions about it: the three most popular were “why isn’t this at planet.gnome.org?”, “why aren’t I on it?” and (to his surprise) “why isn’t there an RSS feed?”

The Planet idea took off rapidly over the next six months. Scott James Remnant was the next off the mark, creating Planet Debian. Remnant and Waugh forked spycroll soon after that to create the Planet aggregator script. In fairly short order, a lot of large Free Software projects needed to have their own planet: the Planet homepage now lists nearly 40 separate planets.

The planets have evolved a loose set of customs based on the ones in place at Planets GNOME and Debian. They do not require that syndicated blogs talk about Free Software or software development all the time: they encourage getting to know your fellow developers as people as well as techs. (John Fleck, a GNOME documentor who is not only a frequent poster, but is a frequent non-tech blogger, has been a kind of an acid test for this editorial policy: see the John Malkovich post and a later complaint.) The larger planets are starting to have to deal with line-ball calls about who should and should not be on the planet pages: Waugh apparently finds requiring that contributors use a real photo of themself somewhat helpful on Planet GNOME.

The planets have proved to be amazingly good at spreading blogging among Free Software communities. The two planets I host, LinuxChix Live and Planet Twisted are close to being my most popular hosted sites. They also fill an important gap in the usual Free Software communication tools: they don’t need to be as on-topic as mailing list posts, and they are more expressive than IRC. They’ve also had some influence on corporate group blogging: Richard Giles reported that the creation of Planet Sun was part of the explorations that led Sun employees to promote blogging internally, eventually leading to the creation of blogs.sun.com.

See also

Creative Commons License
Planet Free Software by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Open Source Web Design

So it’s not like I’m in the most beautiful city in the world or anything, so some of the exciting things I’ve been doing include pulling a bunch of people’s weblogs into WordPress. (OK, OK, and I did take the night mode on Andrew’s digicam for a spin.)

Then this evening I was looking at my website and thinking “time for a change” and decided to head over to Open Source Web Design with the idea that if I ran out of inspiration I could pinch a design and remodel it.

It turns out OSWD’s designs are… really unspeakable awful.

They have a ‘hot or not’ rating system. Here’s my rating criteria:

  1. I would close the browser window immediately if presented with this design.
  2. I would wince and enlarge the font several times, as well as upping the contrast on my monitor, when presented with this design.
  3. This design looks kind of like the rest of the web, except all the really bad bits, which it only resembles in part.
  4. This design is quite good.
  5. Upon seeing this design I would immediately plot to steal it for my own site, held back only by the fear of looking like a big thief, and also someone who can’t do their own web design. (Not that I can do my own web design, it’s just that at the moment I do it anyway.)

I gave nothing a score above three. Andrew was sitting next to me, and he gave nothing above a three either, despite several designs making extensive use of techniques designed to appeal to his demographic. (A demographic consisting of people who love the colour purple.)

If there really are people out there doing free high quality web designs for the fun of it, I’d appreciate some Google terms.

Thursday 21 October 2004

It’s high past time that I stopped using Movable Type 2.6.x on my servers. Various options present themselves.

The first is upgrading to Movable Type 3.x. This is undesirable for two reasons. I’m hitching my wagon to proprietary software that may have arbitrary changes in price and conditions whenever a new version is released. Further, the 3.x version would cost US$99, and while I can afford this, I’m hosting other people’s weblogs using this installation, with one exception, and none of them are paying me. So I don’t feel generous enough to do this.

The next is using a Free Software solution. This is appealing; it’s my default choice in all other kinds of software. However, in order to do this, the software should have available all the features of Movable Type that the users, or I their administrator (captor) need:

Multiple weblog support.
I have ten weblogs. I don’t want to make ten copies of the same piece of (PHP, because everything is in PHP) code in ten different directories; make either ten different databases or fifty different database tables; and give the same users varying permissions over ten different weblogs. This one is a surprisingly rare feature.
User-based security.
This is not so much because I think the users are EVIL as because they will make mistakes. If the software gives them edit access to all the blogs, they will regularly make posts in the wrong place.
Web interface.
Most of these people don’t have their own computer, they blog from labs or the library. Uploading text files doesn’t cut it.
User editable templates.
Julia wants this, at least, and probably Mos does too.

Now, The first requirement alone knocks out a huge number of the candidates. There are a few that remain. Pivot seems buggy and … odd (probably unusable by users weaned on MT). b2evolution really seems like the only other candidate.

The last alternative is writing my own, because there does rather seem to be a hole in the (quite considerable) market. However, I’ve already done this twice, once for puzzling.org and once ages ago for eyes, and it’s getting dull. And in neither of those cases have I gotten anywhere near things like comments, trackbacks, or other basically necessary features. And while I have no doubt that the basic features of multiple weblogs, multiple users, and web editing would be developed fairly rapidly, the list of stuff I’d need to do started looking a little nasty:

  • I have to decide on a backend. Ew. And every time I change the code I have to have upgrade scripts. More ew.
  • In order to have more than one other user and any co-developers at all, there is but one choice of language. (Unless it turned out to be such a killer app that web hosts around the world started installing twisted.web — unlikely.)
  • Input validation and web based authentication are which of: horrible, easy to get wrong and boring? Answer: all of the above.
  • Then there’s that old horror: prevention of evil. Even if I trust my users to avoid malicious markup the question of fending off comment spammers, referrer spammers, trackback spammers and other jerks will arise eventually.
  • Letting users edit the templates is a huge input validation problem all on its own: how do you dig them out when they write an invalid template? (Getting them to write Nevow style templates? Well, anything’s possible.)
  • In order to handle both my own and other people’s bizarro weblog setups, it would need to work with arbitrary file systems, arbitrary vhosts, and quite likely generate URLs specified by the user.
  • In order to work with both my own and other people’s bizarro weblog setups, it would need to parse about twenty different types of export format.

Irritants; Shiny things

Irritants

  1. Our host’s wireless router, whose default DHCP lease is 30 seconds. That’s sure to fill one’s logs, asking for a new IP address every 30 seconds. I discovered recently though that if you specify a lease time, any lease time, it gives you about 40% of that time. There’s some amusement to be had from that if you aren’t nicely sending DHCPRELEASE, but at the moment I am settling for a lease of 852 million seconds. And counting.
  2. Nautilus, which I’m trying to use regularly, not just for reasons of sympathetic magic, but because it is indeed useful to manipulate pictures by dragging and dropping thumbnails. (A future project is a fairly hard-core script for resizing pictures en masse because the g-scripts ones don’t talk gnome-vfs. Wait on this one, I need to learn pygtk.) However, the usefulness is being offset by the incredible number of bugs, mainly related to frozen redraws or uninformative and occasionally wrong error messages, that manifest themselves every time I try and use it for non-local file access.

Shiny things

  1. Ubuntu’s "Human" icon theme (you may have to explicitly choose this). So many happy talking faces in one X-Chat icon, I can hardly believe it.

Hosting

About this time last year I was unhappy with my website host, and happened to be Googling for new hosts (Google Ads can be pretty useful as long as you were intending to part with money anyway) when I came across the idea of virtual servers: that is, paying someone to run a process for you that behaves just like a little Linux machine.

The concept was just great for me, because I had this enormous list of specialised hosting requirements that started accruing way back when I was hosting for free on a server tucked away at Andrew’s work place. These requirements include something that no shared hosting provider gives out (multiple shell accounts) and stupid requirements like the ability to construct the infinite number of addresses with – signs in them that Andrew and I use for various purposes, mainly for sorting mail from online companies into different folders.

Anyway, the virtual servers appeared to have all the advantages of dedicated servers that I needed (root access with the usual powers minus kernel upgrades and driver fiddling) without the hassles of dedicated servers, which basically comes down to price and hardware maintainence.

Since then though, I’ve embarked on a round of server hopping the likes of which would make any Australian ADSL junkie proud.

A year in review:

Bytemark. These guys are pretty good. They have a simple little shell app where you can login into the host server, poke at your parasite server, reboot it, access the consoles and so on. You can also overwrite the whole thing with a new clean server. These applications are really useful and many virtual server hosts don’t have them. When they don’t, if your server disappears from the ‘net, you’re at the mercy of tech support, even if the host server is up. I switched away from them because they were in the UK and the delay when typing from Australia was annoying me. In retrospect: dumb.

JVDS. Random web pundit consensus seems to be that these guys have a pretty good deal on RAM, disk space, and whatnot. They seem to have the most recommendations. They had reasonably prompt user support. Unfortunately, we really needed it, because they messed up our server’s setup. Every time the host machine rebooted, someone else’s server would come up in place of ours. And this server, being where ours should be, would start receiving all our mail, and not recognising the addresses, rejected it all. Delayed mail not good, bounced mail bad. Further, we had no access to the host server and couldn’t check our machine’s status. So after the fourth time they promised and failed to fix the phantom server problem, we moved hosts again.

Redwood Virtual. These guys have an amazing deal on RAM which was why we went to them. Unfortunately they’ve had two major problems: consistent ongoing performance problems probably related to disk, and massive downtime. Like JVDS they don’t give you any access to the host server that’s useful when your parasite goes down, and unlike JVDS, they don’t have 24 hour support. It turns out they grew out of a bunch of friends who got a dedicated machine and some IP addresses and started playing around with UML.

Linode. I’m testing these people at the moment. While not a strikingly good deal on RAM or disk space, these guys have the most sophisticated host server management facilities I’ve seen. They’re the only host so far with an easy way to find out your bandwidth use. You can reboot and stop your parasite server. You can subdivide your disk space, reformat it, and reinstall at will. You can maintain different installations and switch between them. You can purchase extra RAM and disk space and have it added automatically. You can access your parasite host’s consoles. You can configure reverse DNS. And then, only if there’s something wrong with all of that, you can hassle tech support. Finally, although it’s possible my parasite server just is on a new machine, they seem to have good performance.

[Side note to Twisted people eager to promote one of their helpers: thanks, I’ve heard of tummy.com. However they’re relatively expensive and offer less disk space than I need.]

Linode isn’t all roses though.

First of all, they’re draconian about spam. I’m fine with “thou shalt not spam.” I’m less happy with “if you ever get us blacklisted we will charge you $250 an hour for the time it takes us to get un-blacklisted.” (Background story: I used to run a secondary mail server for twistedmatrix.com. Spammers, for various reasons, like to send spam via the secondary mail server. Hence, I was handling all of twistedmatrix.com’s spam and forwarding it to them, as secondary servers are meant to do. One of their users noticed my server’s name in all his spams, and promptly got me in trouble with my provider, who was JVDS at the time. Moral of the story: it isn’t hard to falsely look like an open relay, and never secondary for someone who may have users who can read email headers but don’t know DNS.)

Second, as mentioned, they’re not the best deal on RAM and disk space. In particular, I probably am really pushing it trying to run a server under my current demand with 64MB RAM, especially as either Nevow or my Nevow app is a really memory hog. And, goddamn, memory usage needs to be a priority for virus and spam checkers. Amavis doesn’t even do any actual matching for me, it just hands off to clamav, and it still eats 6-10MB of memory.

Finally, nitpicking, their default Debian images have some weird problems, most noticeably not have 127.0.0.1 localhost in /etc/hosts. I hope I’ve come across the majority of these now.

However, I am hoping that a week or two of testing (they’re already handling incoming mail for Andrew and myself) will show them to be sufficiently stable and agile to look at settling there for a while.

Nifty; Job

Nifty

I was trying to deal with LiveJournal’s XML-RPC interface to transmit some UTF8 encoded text. It wasn’t working so well, so Andrew introduced me to the following Python snippet (which will test your installed fonts nicely):

print u'abcdefg€ñçﺥઘᚨ'.encode('ascii', 'xmlcharrefreplace')

The output is:

abcdefg€ñçخઘᚨ

Note: you should actually run this file rather than just whacking the script into your Python command line interpreter, because my console or interpreter didn’t like unicode input and mine was set up by those whacky Python nuts at Canonical.

Get that? (Pfft, don’t look at the HTML source, I had to change all of the & signs to &.) It takes the nasty Unicode string "abcdefgó€ñçﺥઘᚨ" and reencodes it in ascii, replacing all the non-ASCII characters (everything except ‘abcdefg’) with XML character references to their Unicode value. The upshot being a XML snippet that you can transmit in ASCII if you’re ever dealing with an interface that doesn’t seem to like your UTF-8 encoded strings.

Job

Not that I have a good resume online, but with half my holdiay over, I’m looking for six months or so of work in Sydney when I get back. I’m available from early December. Python, Perl or Java programming, or possibly tech support, tech writing or office admin, but I’ve got a better resume for the junior programming positions. Leads appreciated!

Pretty pictures

Pretty pictures

Ubuntu changed their default theme to include a harmonious humanity image featuring three pretty young things, which is causing considerable controversy mainly because the models used in the pictures are in various states of (well and truly legal in Australia) partial nudity. Screenshots linked here unless the poster takes them down. (PNGs I ask you?)

A lot of people are making the argument that those images may be inappropriate if displayed in a corporate environment or alternatively to conservative friends or family members. I don’t think anyone’s admitted to being too conservative themself to like the image, so I’ll start.

I like portraiture and good photographs, as it happens, and it can get as naked as can be. Fetish shots are fine as long as I know roughly what to expect. These shots are good photographs and reasonable portraiture, although they’re a bit more glossy/pretty-pretty than I like to see in galleries.

But for some reason, which must be unpopular judging from every theme site I’ve ever seen, I really dislike having people prettier than me on my computer’s desktop. I don’t think I’ve ever had portraits on it at all in fact, but if I did, I would never start with models. Something in the idea leaves me very cold: I’d much rather teh-boring than teh-pretty-people. (In actual fact though, I have a pretty castle shot: not the most amazing shot ever, but a favourite amongst my own.)

(I wonder what is psychologically at the root of this? Perhaps people roughly divide into two: people who’d love to strip off a bit and be happy and playful for a camera, and the other half of people — or maybe that’s just me — whose instinctive reaction to the idea has a little bit of ew in it. It certainly messes with the intended vibe.)

Update: Andrew showed me the proposed CD cover which has similar artwork, and for some reason I have considerably less squick. Maybe I’m acclimatised to teh-pretty when shopping. On the other hand, since partially naked people are usually selling things I don’t want, I think I’d pass it by on the sales rack without a second glance.

Syndication, aggregation, and HTTP caching headers

Syndication, aggregation, and HTTP caching headers

I’ve seen various people in various places lately who were very unhappy about someone requesting their RSS feed every 30 seconds, or minute, or half hour, or whatever, and re-downloading it every time at a cost of megabytes in bandwidth. I’ve also seen people growing unhappy with the Googlebot for re-downloading their entire site every day.

So, a quick heads-up: there is a way for a client to say “hey, I have an old copy of your page, do you have anything newer, or can I use this one?” and for the server to say “hey, I haven’t changed since the last time you viewed me! use the copy you downloaded then!” Total bandwidth cost: about 300 bytes per request. That’s still a bit nasty for an ‘every 30 seconds’ request, but it means you won’t get cranky at the 10 minute people anymore. Introducing Caching in HTTP (1.1)!

The good news! Google’s client already does the client half of this. Many of the major RSS aggregaters do the client half of this (but alas, not all, there’s a version of Feed on Feeds that re-downloads my complete feed every half hour or so). And major servers already implement this… for static pages (files on disk).

The bad news! Since dynamic pages are generated on the fly, there’s no way for the server software to tell if they’ve changed. Only the generating scripts (the PHP or Perl or ASP or whatever) have the right knowledge. Dynamic pages need to implement the appropriate headers themselves. And because this is HTTP-level (the level of client and server talking their handshake protocol to each other prior to page transmission) not HTML level (the marked-up content of the page itself), I can’t show you any magical HTML tags to put in your template. The magic has to be added to the scripts by programmers.

End users of blogging tools, here’s the lesson to take away: find out if your blogging software does this. If you have logs that show the return value (200 and 404 are the big ones), check for occurrences of 304 (this code means “not modified”) in your logs. If it’s there, your script is setting the right headers and negotiating with clients correctly. Whenever you see a 304, that was a page transmission saved. If you see 200, 200, 200, 200 … for requests from the same client on a page you know you weren’t changing (counting all template changes), then you don’t have this. Nag your software developers to add it. (If you see it only for particular clients, then unfortunately it’s probably the client’s fault. The Googlebot is a good test, since it has the client side right.) An appropriate bug title would be I don’t think your software sets the HTTP cache validator headers, and explain that the Googlebot keeps hitting unchanged pages and is getting 200 in response each time.

RSS aggregater implementers and double for robot implementers: if you’ve never heard of the If-None-Match and If-Modified-Since headers, then you’re probably slogging any page you repeatedly request. Your users on slow or expensive connections hate you, or would if they knew the nature of your evil. Publishers of popular feeds hate you. Have a read of the appropriate bits of the spec and start actually storing pages you download and not re-downloading them! Triple for images!

Weblog and CMS software implementers: if you’ve never heard of the Last-Modified and/or ETag headers, learn about them, and add the ability to generate them to your software.