Internet technologies – Page 2

Background knowledge: this post requires some knowledge of networking, at least to the point of knowing what IPv4 and IPv6 are, and what is meant by subnet notation like “/60” and “/64”.

I’ve just changed ISPs, because I wasn’t much of a fan of my old ISP’s demand that either we enter into a new 12 month contract before 27 November or they’d consider us re-contracted at that date. My new ISP is Internode, Australia’s favourite geek ISP, in part because they offer native IPv6 and it’s even supported by customer service. It took me an entire 24 hours to succumb to the temptation of wrecking my perfectly good home network by attempting to make it IPv4/IPv6 dual stack, partly motivated by Geoff Huston’s “the sky is falling” keynote at linux.conf.au 2011. I like doing my bit to hold up the sky.

I use a Linux machine as our router rather than a consumer router device, that is, my ADSL modem is set to bridge mode and we use our wireless router just as a switch; neither of them do routing. (Or shouldn’t, but we’ll get to that.) In terms of resources for doing this with Internode, or any other ISP who will advertise your IPv6 routes via DHCPv6, here’s some useful material:

Internode’s own guide, which is a touch sketchy but gets the basics across
Shane Short’s more extensive write-up of Internode and IPv6

The main problem I had is that for as yet unexplained reasons, while this radvd.conf stanza worked fine when my Linux server ran Ubuntu 11.04 with radvd 1.7, it doesn’t work on Ubuntu 11.10 with radvd 1.8:

prefix ::/64 { AdvOnLink on; AdvAutonomous on; AdvRouterAddr on; };

radvd 1.8 was advertising this in such a way as to get my Linux client to give this error (in /var/log/syslog):

IPv6 addrconf: prefix with wrong length 60

That is, it seems to have been advertising the entire /60 that Internode routes to each customer rather than a single /64. We ended up having to do something like this:

prefix 2001:db8:aaaa:bbbb::/64 { AdvOnLink on; AdvAutonomous on; AdvRouterAddr on; };

That is, because Internode’s IPv6 allocations are static, we just manually picked a /64 out of the /60 allocated to us, and advertised that. I’m not clear if this a bug or a change in the way radvd works or a mistake of mine, we never got a chance to find out because of a showstopper which you’ll see in the next, and at this stage, final post in my adventures in IPv6.

Syndication, aggregation, and HTTP caching headers

I’ve seen various people in various places lately who were very unhappy about someone requesting their RSS feed every 30 seconds, or minute, or half hour, or whatever, and re-downloading it every time at a cost of megabytes in bandwidth. I’ve also seen people growing unhappy with the Googlebot for re-downloading their entire site every day.

So, a quick heads-up: there is a way for a client to say “hey, I have an old copy of your page, do you have anything newer, or can I use this one?” and for the server to say “hey, I haven’t changed since the last time you viewed me! use the copy you downloaded then!” Total bandwidth cost: about 300 bytes per request. That’s still a bit nasty for an ‘every 30 seconds’ request, but it means you won’t get cranky at the 10 minute people anymore. Introducing Caching in HTTP (1.1)!

The good news! Google’s client already does the client half of this. Many of the major RSS aggregaters do the client half of this (but alas, not all, there’s a version of Feed on Feeds that re-downloads my complete feed every half hour or so). And major servers already implement this… for static pages (files on disk).

The bad news! Since dynamic pages are generated on the fly, there’s no way for the server software to tell if they’ve changed. Only the generating scripts (the PHP or Perl or ASP or whatever) have the right knowledge. Dynamic pages need to implement the appropriate headers themselves. And because this is HTTP-level (the level of client and server talking their handshake protocol to each other prior to page transmission) not HTML level (the marked-up content of the page itself), I can’t show you any magical HTML tags to put in your template. The magic has to be added to the scripts by programmers.

End users of blogging tools, here’s the lesson to take away: find out if your blogging software does this. If you have logs that show the return value (200 and 404 are the big ones), check for occurrences of 304 (this code means “not modified”) in your logs. If it’s there, your script is setting the right headers and negotiating with clients correctly. Whenever you see a 304, that was a page transmission saved. If you see 200, 200, 200, 200 … for requests from the same client on a page you know you weren’t changing (counting all template changes), then you don’t have this. Nag your software developers to add it. (If you see it only for particular clients, then unfortunately it’s probably the client’s fault. The Googlebot is a good test, since it has the client side right.) An appropriate bug title would be I don’t think your software sets the HTTP cache validator headers, and explain that the Googlebot keeps hitting unchanged pages and is getting 200 in response each time.

RSS aggregater implementers and double for robot implementers: if you’ve never heard of the If-None-Match and If-Modified-Since headers, then you’re probably slogging any page you repeatedly request. Your users on slow or expensive connections hate you, or would if they knew the nature of your evil. Publishers of popular feeds hate you. Have a read of the appropriate bits of the spec and start actually storing pages you download and not re-downloading them! Triple for images!

Weblog and CMS software implementers: if you’ve never heard of the Last-Modified and/or ETag headers, learn about them, and add the ability to generate them to your software.

Syndication, aggregation, and HTTP caching headers by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Category: Internet technologies

IPv6: prelude in the key of radvd

Syndication, aggregation, and HTTP caching headers