Brickbats; Mail

Brickbats

Two programs that have hindered me lately through design decisions that might well have been right on the money, but not my money:

Epiphany has everything I want in a browser except for one thing. (Well two things, I don’t know how to make it pretend to be Mozilla for Windows in order to fool my bank’s dodgy browser detection software, but for some reason I’m inclined to be forgiving here.) When I go to the location bar and start typing a URL, say http://example.com/foo, and I see http://example.com/ in the auto-completion list, I would like to select http://example.com/ and then just have to type foo. Epiphany, I love you but you have betrayed me.

workrave, I think you’ve saved my hands, but I wish you wouldn’t be so punitive when someone types during a break. Sure, block the input, but let the break run. Without exception, anyone who needs to quickly grab my keyboard for something is soon found in the middle of a break ritually pounding on the keyboard and cursing at you while you obstinately sit at "time left: 13 seconds". Stopping typing seems to be something that just doesn’t come naturally. And all those curses will add up eventually.

Mail

I’ve been working one day a week as an administrative assistant for the last year (it’s hard to get someone with LaTeX knowledge unless you’re willing to pay for someone with typesetting/publishing experience or… you know some computing students). I’ve just finished training my replacement, and if there’s one thing I’ve realised, watching her, it’s what a mail-hound I am.

My entire computing routine is focused around email, and a bunch of my computing knowledge, especially Linux knowledge came directly from trying to ensure that I have round the clock access to email. I’ve read much of the mutt manual. I taught myself regular expressions so that I could use procmail. About half of what I know about my editor comes from a desire to output email as fast as possible. I’ve spent hours on an elaborate setup involving multiple copies of my mail, automatic spam filter training, various From addresses and other such things.

Thanks to an alignment of the planets, that’s exactly what I needed for this position, which involved, among other things, accepting tarballs as uuencoded data smacked into the body of mail messages. (I could automate this, and mutt should, but it turns out that Evolution can get it out automatically, and then I use mutt to delete the attachments. Oh well, noone’s perfect.)

My custom setup for handling medium to large amounts of mail is by no means unique among people I know, but it certainly seems pretty rare among people as a whole. I wonder what processes admin staff have evolved in parallel?

Monday 19 July 2004

I’ve been working on Backwards again: I’m trying to pull all the stuff out of it that’s specific to my site so that other people can use it. I’m getting closer: Andrew is trying to deploy it now, which is helpful. I’ve finally provided site creators with the ability to drop in their own DocFactory easily, which is a nice touch because you can choose to do the DocFactories with Stan, which is ever so much nicer that typing HTML tags, or you can do it with HTML/XML if you like. It means that it’s still a programmer’s web content tool, but I don’t intend to change that.

I’ve also finally stuck in some code that adds the cache validation Last-Modified and ETag HTTP headers to every page request. I added it to the RSS feeds a while back after I noticed that Jeff’s aggregator and Planet SLUG both poll for updates every ten minutes. I didn’t think it would be so useful for the rest of puzzling.org, because most browser visitors come in from Google to look at my summaries of my high school texts.

I’d forgotten about the Googlebot itself though. It’s a pretty regular visitor to my site now — it seems to go through every few days. Other search robots are less frequent visitors. The cache validation headers are preventing the full transmission of an awful lot of content to robots now. Something for a lot of dynamic blog tools to consider doing — perhaps many do it already.

The only problem I’m having is working out what to do when the templates change because really, the validation headers should change then too. The content of the page won’t have changed, but the layout will have. There’s a couple of possibilities:

  • drop the Last-Modified header and base the Etag header on some kind of hash of the page content (I’m currently setting weak ETags based on the timestamp, actually), which means turning off Nevow’s incremental render; or
  • extend the existing “change detection” mechanisms to detect changes in the template as well as changes in the content.

Changes in the content are currently detected in a variety of ways, but they’re all based on file timestamps. I haven’t come up with a way to detect a template change yet that doesn’t place burden on the site maintainer to record the fact or date of the change manually. I could insist that the templates be files so that I could check the datestamp but having them as nevow.loaders.DocFactory Python objects is desirable for other reasons. I guess I could also stick the template in some kind of database and timestamp it there. (Actually, doing the latter might be one way to avoid the “need to restart the process when the templates change” problem too. Maybe I have a winner here.)

There’s a few other things I want to sort out before resting for a while (by which I mean “making a numbered release which probably noone will use anyway”):

  1. Documentation. I actually loathe documenting my own projects as much as anyone, it’s only documenting other people’s that I don’t mind.
  2. URL generation. Unfortunately, the fact that I’m using old twisted.web with my personal Backwards site means that the URL generation code is an immense mess. Because old web only talks to it through a proxy, and the proxy code doesn’t set the forwarding headers, I can’t use Nevow’s URL generation mechanisms without ending up with a bunch of http://localhost:8080/ URLs. So I have my own clunky hard-coded base URLs because there’s no way to get them from the request object when it’s behind a twisted.web.proxy. I keep wanting to have a weekend-long hackfest on new twisted.web to get it deployable, but I’m not a “twisted developer” in that sense, and the way they use sandboxes has always said “my rewrite, no you touchie!” to me.
  3. Persistence. The amount of data stored in memory is really too large and should be much lower. So I need to stick it somewhere. Which means choosing between persistence systems on what I currently feel is way too little information. I currently have a bastardised mixture of shelves which really should be one file.

Trip planning: finale

You know, this really ought to be called my "misc log" not my "tech log". But I’m sure you all understand that that would mean I might start posting quizzes. Hence, I can only continue to apologise for being off-topic on my own damn website.

Trip planning: finale

Lookit! Look! We have a final itinerary, at least as far as Asia. I’m likely to move about a bit using this as a base, Andrew will be tied down by his need for an umbilical cor… Internet access.

8th August
Andrew in London
20th August, 21st August
I arrive in San Francisco from Sydney on the 20th, Andrew arrives from London on the 21st. (Something to note here is that the 20th is our fifth anniversary. Observe that not only will I have a lonely anniversary, I will also have a rather long one, thanks to crossing the date line that day.) At this stage we’re intending to spend a few nights in San Francisco and then a few more nights at Stanford, but… this depends on people at Stanford.
26th August
Boston
30th August
New York
2nd September
Washington
7th September
London. Well, London for Andrew, I’m currently planning on visiting Scotland and France as well as whatever bits of England sound like a good idea at the time. Don’t expect a firm itinerary for that, I only do these things for other people’s benefit. For the good of my own soul, I intend to travel at a whim, or at least the whim of the Pound.
1st October
Palma de Mallorca. Yes, this is a somewhat odd choice of location in Spain, but we found somewhere to stay with wireless ‘net access. What can you do?
23rd October
Prague. The thing in the Austrian mountains fell through. I maintain my sanity by reassuring myself that there are plenty of other fish in the sea and mountains to be seen. But opinions of Prague are uniformly good, if you ignore the travel advice about crime — hey it’s not their job to sell the place. And Andrew likes the cold…
13th November
Bangkok. Yes, it’s terrible isn’t it? Hopefully we can get out of Bangkok as soon as possible and spend a few weeks elsewhere in Thailand and possibly neighbouring countries.
28th November
Sydney — although Andrew will, I guess, have another overseas meeting in December, so goodness knows how this will work for him.

While the time has now past to say things like "hey, you know, Russia is an excellent destination" we’d still appreciate hearing from anyone in or near these areas who’d like to meet up, and also of interesting things to see (natural beauty is great, I like historical sites and Andrew wouldn’t say no to art galleries…) in areas near those listed.

This is, I hope, the last time I’ll post this publically. At some point in the future you can look forward to me having something meaningful to say about these places.

Trip planning update: now with fresh new plans

In the last season, our intrepid heroine’s plans in Europe involved doing "stuff" in Europe, where "stuff" involved quintessential holiday-in-Europe things like thinking ‘hmmm, Prague sounds nice, I wonder when the train leaves’, and ‘goodness, when I went to sleep everyone was speaking French, but now they’re all speaking German!’

Unfortunately, various constraints have subsequently emerged, the most important of which is that Andrew would like to spend a lot of quiet time with an Internet connection during our stay. I originally considered a longer working holiday, but eventually decided to stick with our touristy plans with longer stays at a few places with Internet connections rather than a nomadic existence. Alas, such is the nature of compromise. Unfortunately it turns out that the combination of "Internet" and "major city in Europe" pushes your accommodation right into the 100 euro a night range, which is well outside my budget. Or at least so it appears from my attempts to find accommodation online. At the moment I have a phone call to make to Spain regarding a bed and breakfast, and another to Austria about a little studio in the mountains. I’m a little attached to this mountain idea, I hope they didn’t leave a zero off the price.

Now, for a rough itinerary, pending some interesting travel agent manipulations which I’ll return to soon:

San Francisco/Stanford
Arriving Fri 20th August, leaving Friday 27th August

Boston/New York/Washington
Arriving Boston Friday 27th August (I may try to move this back a few days), leaving Washington Monday 6th September. The exact split of this trip down the coast I haven’t worked out yet.
United Kingdom and France
Arriving London Tuesday 7th September. Andrew will spend the rest of September based in London, I will likely roam more widely. I will visit France, the north of England and possibly Scotland or Wales during this time.
Spain
Assuming that accommodation works out (as best I can tell from looking for cheap accommodation in or near Barcelona and Madrid is that it exists only in myths dating from 1970), we will be in Spain in the first three weeks of October.
Austria or the Czech Republic
Again, depending on accommodation and proximity to our old friend the ‘net, we’ll be spending three weeks in Austria from about the 23rd of October, alternatively I’ll look for something nearer Prague.
Hanoi or… err… Bangkok
This is where real travel agent magic begins, but we’ll be somewhere in south east Asia from about the 13th November on.
Sydney
My present bookings have me in Sydney on the 30th November.

The magic needed revolves around trying to visit Hanoi rather than Bangkok. Our round the world ticket includes up to 26 000 miles of travel, including, unfortunately, overland trips. We were hoping to fly from Frankfurt to Hanoi, but the best route the agent has been able to find involves flying via Bangkok. That and the fact that the distance is counted from our last landing (London), bring the trip total to twenty six thousand and sixty miles. That’s right. sixty miles over the limit, pushing the total price up by five hundred dollars.

The agent is going to try and get their product manager, which is apparently a title equivalent to "high ranking wizard," to reroute the trip. At present the cleanest solution involves subtracting either Boston or Hanoi from the stops but that would leave miles to spare, not to mention making me unhappy. I want to see them bring it in at twenty five thousand, nine hundred and ninety nine.

My apologies to everyone in Washington hanging on until I pull some precise dates out of the air. The next round should get it.

Documentation bugs

I tried to find useful third party guidelines for filing bugs against documentation, and wasn’t encouraged to see something I wrote on the second page of Google’s results for “filing documentation bugs”. So I wrote something better.

I’ve been notionally working on Twisted’s documentation for about six months and I’ve been disappointed to find it’s more something to be avoided rather than something to avoid with. It is at least true that I’ve found bug reports useful as a way in: “need to fix bug” is a much more useful starting point for me than “need to improve documentation.” I’m not absolutely sure about this, but I seem to be one of the only Twisted developers (hah! I develop not!) who files bugs against themself. It’s my only consistent adoption of todo lists in my entire life to date.

However, I’ve written an entire CMS and now a series of articles (which I’ve decided are so detailed that noone who’s read them would ever think they were remotely competent enough to buy and use a domain name) rather than write or edit Twisted docs. This is very disappointing, but I’m not contemplating giving up at the moment, despite the occasional nagging fear that my presence as docs editor is holding back someone vastly more committed waiting in the wings. (I suspect there is no such person, but if there is, do get in touch.)

It’s been interesting when I do work on docs though to discover exactly what parts of it I like doing: it wasn’t what I suspected when I began. Writing text for them is very hard. My output is about a paragraph an hour except in rare cases where I understand the code sufficiently well that I don’t have to read or write 100 lines of code to convince myself I understand what’s going on. I’m ambivalent about editing other people’s text for clarity or style — in many ways I love this, but it feels too easy to slip into the “make it as if I wrote it” rather than “improve it” trap. I quite enjoy producing example code, possibly more than I enjoy producing real code. This is probably the right way around — unless I’m a very atypical documentation user, example code is worth far more than text except in the case of design discussions.

I think the main problem I really have is that I’m terrible at ongoing tasks. My coding is the same: I like to have something working at the end of every coding “run” (which for me is about three hours of solid concentration, of which I can currently do two in a day if I have a good day). I like to have a passable document, or at least section, at the end of every docs run (same length as a code run, three hours must be my natural concentration cycle). I’m discouraged from starting anything that can’t be broken up into three hour chunks. In the case of documentation, in which it’s very hard to tell when I’ve finished, this is occasionally paralysing.

Trip planning update — fresh new info!

I’ve finally got enough money so that this is absolutely, unquestioningly going ahead. I also have a schedule of sorts… well, I want to be in France by mid-September before Jean-Phillipe comes back to Australia. So the plan is something like this (imagine me waving my hands in complicated patterns in the air and saying "this is precisely your solution space!"):

  • Mid August: US west coast, mainly Palo Alto (Stanford) and San Francisco, and I’ll probably spend a few days in the Los Angeles area since a friend will be living there by then. I hope to divert to the mountains and national parks which are kinda-sorta in the vicinity. Ish.
  • Late August, early September: US east coast, including DC, Boston and New York. I’ve more or less abandoned my plan of driving across the US — that would be more likely to happen if I was travelling with someone else who could drive, but I don’t fancy doing all the driving — and I don’t fit in buses so at this stage I’m planning to fly. I don’t fit in planes either (oh trans-Pacific leg, how I dread you) but they go faster.
  • Early/mid September: UK and France
  • After that: eh, we’ll follow our nose. I don’t have anyone to visit between western Europe and Beijing.

The earlier set of dates will hopefully be firmer within two weeks because we need to give a number of people in the US and western Europe advance warning of our plans, but after France I really am hoping to travel without a tight schedule.

I’ll be back in Australia late October at the latest unless my estimates of how much money I’ll be spending are wrong, and to be honest, they’re more likely to be under and see me back in Australia by the end of August, a whole two weeks after I left, than they are to see me still overseas in November.

I’m really hoping that I won’t completely run out of money by, say, Boston, but I’m not sanguine. I thought last week "surely you can live cheaper than AU$3000 a month in these places?" and my mother made soothing noises, but then I remembered how high rents are in Boston and realised that people there are actually paying AU$1000 – AU$1500 a month (about half my after-tax pay) just for accommodation. Yikes.

I’ll definitely trade-off cheap accommodation against having some money to spend on food and sights, but even so, accommodation is going to really hurt.

Everyone keeps saying "spend more time in fewer places" but you know, none of them recommend the same places. My grandmother, for example, says I can’t come back to Australia without seeing Florence. Mos and Jean-Phillipe say that Paris is a necessity. A number of people have recommended Athens (some of them even think it would be worth being there during the Olympics). Everyone’s in favour of Prague, although whether it’s for history and sights or cheap beer varies depending on who I ask. It’s fortunate, after all, that people aren’t quite that keen to sell the sights of the US, or I’d never get even halfway around the world.

Pay no attention to the man behind the curtain

Working in natural language processing is so disappointing sometimes — probably a lot of AI work is similar. You imagine somehow that all these clever clogs are verbing nouns and nouning verbs and otherwise initiating computers into the mysteries of human language, but really, you spend an awful lot of time writing regular expressions and using other tools at a similarly shallow level.

Case in point: gazetteers. A very common approach to the problem of locating place names or person names in a piece of text is to to have an enormous list of place names and person names and simply see if your candidate is in the list (yeah, you can make it more intelligent by checking for common surnames or the like too). It’s kind of understandable when you try and articulate rules for telling person names from place names (c’mon, can’t you imagine a person named “Ayers Rock” and some kind of landmark named “Apple Martin”?) but it still feels … un-fun.

I don’t work with them much, but I can see why the field is so fascinated with statistical approaches at the moment. Generalising over a set of data is so exciting. Stephen gave a very interesting talk on Monday about what Regina Barzilay and Mirella Lapata are doing in sentence ordering, and it didn’t seem to involve them having to sit down and manually write out any rules at all about how sentences are ordered. Very refreshing. Rigorous logical approaches have the same kind of appeal, but they’re less commonly used.

A lot of the time I don’t think I’m made to be an experimenter. But then I remember that I don’t think I’m made to be a programmer either. What I’m made to do is sit on a bean bag and have lackeys bring me interesting things to read. I can’t see that working as a job description.

Tuesday 22 June 2004

Now that I’m hosting this in the same place as my main log, it wants to become my main log’s brother. It wants me to tell it all about whatever spiritual hiccups have smeared my professional programming today. It wants to be my tech confessional. But I’m refraining. Perhaps it might get lucky and be used as a non programming tech log one day.

Currently, I’m struggling with Backwards design issues, specifically storage issues. This stuff, while not quite as dull as rule based information extraction or user input validation, is still pretty dull, and every time I think about it I think about downloading Zope 3 and using that and then I come to my senses and realise that I’d have to rewrite the whole thing.

It’s a backend problem. puzzling.org has always been pretty much a simple tree in structure so the filesystem makes sense as a storage mechanism. Except when it doesn’t, in precisely those cases when puzzling.org stops looking like a tree. For example, consider the logs. their tree structure is: root, year, month, day. However, I want the leaves of the tree (the entries) to be a doubly linked list, ie to have previous and next links.

I can search the tree in this case (if I’m looking or the entry before the 1st of January, I crawl up to the root and down into the previous year to find the 31st of December), but if I decide not to, I need to calculate some data and store it somewhere. Where? Well, I only have made 566 diary entries over the past three years, I could just about store the list in memory. But if I don’t, I need to figure out where to put it.

The case for a links blog (which doesn’t exist yet) is harder. If it is to look like my del.icio.us page, each url needs to be associated with a title, a description and a list of categories. But the sane web tree configuration is root, category, url, (as opposed to root, url, category) which means being able to make the "what urls are in this category?" query easily. When urls are in multiple categories, how do you represent that in the filesystem with a < O(n) complexity query time (n being the total number of urls)? Symlinks?

Well, it’s a trick question as far as I can see. The filesystem’s fairly strict tree structure and limited query mechanisms mean that it isn’t a good backend for this. Which means databases of course. Which means researching databases and choosing between them and learning to use my choice and dependencies (because Nevow isn’t a major dependency, no!) and ew. Hmph. I like the filesystem.

Wednesday 16 June 2004

So Dave Winer has pulled all (most?) of the free weblogs.com content. Authors can, at some point, take advantage of a one time offer to get a copy of their content. Nice Dave, good boy. In the mean time, criticism is supposedly muted because people’s content is being ‘held hostage’ for at least the rest of the month. (I don’t actually know about that, all I’ve read is the criticism. I haven’t bothered with the nicey-nice stuff.) [Update in the interests of completeness: there is now a transition plan to 90-day free hosting on buzzword.com. Content has been restored.]

A couple of things concern me about this. One is this persistent notion that’s probably been around since the beginning of time and will probably be around until the end that having the right to do something is a justification for doing that thing. (Hint: "but I’m allowed to do that" is a non-defence against criticisms of your failures of courtesy, generosity or general personability. The whole point of that stuff is that it requires you to do more for others than the bare minimum that you’re compelled to do.)

The other is this notion that if you’re getting something for free, you deserve what you get when it all turns sour. As others have noticed, this is the same stuff that was levelled at people who were shocked about Movable Type’s new licencing schemes. Mark Pilgrim re-wrote that debate in his terms in his Freedom 0 essay. What the people who wanted something for free did wrong wasn’t trying to get something for free ("something free", if you don’t like people playing fast-and-loose with the multiple senses of "free"), it was not getting a guarantee of that freedom. Hopefully Shelley Powers can do something similar with her thoughts on The Value of Free:

There’s nothing wrong with not doing the free thing. However, there’s also nothing wrong with the people who accepted the free thing, freely given… Each person who accepted these free things also gave something back in return: whether it was bodies when webloggers were few, or grateful acknowledgement when webloggers were many. Though those who have benefited from these free services in the past should be grateful, they don’t deserve to be called "cheap" or cut loose without warning. Free does not equate to no value.

Shelley Powers

The point of money is to abstract over some notion of value in a way that allows values to be compared. It’s efficient to be able to compare price tags. But the consistent confusion of money and the value it represents in some cases is concerning for all kinds of reasons. Limiting concern to Free Software alone, it would mean that there is no quality without money; that there are no ethical obligations without money; and that nothing of value is exchanged without money.

My personal instincts about this favour social changes that move from a rights based discussion ("I’m allowed to do this, I’m not compelled to do that") to a courtesy and generosity based discussion. What were the nice things Dave Winer could have done if he couldn’t provide free hosting anymore? What’s an ethical way to write software? What’s an appropriately thankful way to use it? I know, oh, I know that people have been talking in these terms for thousands of years too. I still wish they’d do it more often.

Sunday 13 June 2004

This thing has been around for a day and already people have asked me about comments. I don’t get asked about comments for the other log very much. I turned them on on Livejournal (this is cross-posted) with some trepidation and it’s worked out surprisingly well.

There are two main reasons I’ve avoided comments or web editing in Backwards. The first is that I’m not a big user of web editing: I’ve had too many crashes that cost me work, accidental cuts I can’t undelete, and old cached copies of the page in the form (thank you Zope 2, or was that Squid?). Plus it’s always someone else’s UI and they’ve always set something to be too small. The other reason is that input validation is currently competing with rule based information extraction (just don’t even ask) for my "least exciting programming chore" award. The beauty of writing the entire site myself is that I don’t have to check for malicious mark up, logins, cookies and other horrible things. I have all the power, no one else has any. Easiest authentication problem ever.

But it all comes down to the fact that I instinctively dislike the idea of comments on puzzling.org because it’s all mine, precious. Maybe I’ve spent too much time in the wrong comments threads, but I just don’t see the appeal of spending however many millions of hours I’ve spent this year in order to give people a forum to attack me and a guaranteed audience for their troll-fest. I want to put a click between me and my critics. Given the Livejournal experiment, this is a bit silly: people use my comments to say things like "let’s go crazy Spanish style" rather than "I will eat your young, ignorant evil-doer." Even so. Precious. One day someone’s spam robot would leave a comment and I’d feel personally violated.

There’s a pot and kettle problem though, because I prefer it when other people leave comments on (or in the case of my fellows who write their own CMS, write a comments system and then leave it on). There’s a certain social niche comments fill. Writing an entry to say "happy birthday" or "wasn’t it a nice day?" in response to other people’s entries is a noise problem more than a social activity. Sending an email works a bit better, but people are protective of their inboxes. Plus you miss out on interaction between commenters.

Wow, it really is possible to talk yourself into things isn’t it? Good thing I didn’t try and balance out the pros and cons of writing input validation, or I’d be spending today adding a comment facility to this thing. As it is I need to add some features for Andrew so that I can acquire my first user.