Version control idea: modified times

I tend to commit almost everything that doesn’t move to version control, but there’s one major exception: the source data for my website puzzling.org, which is mostly text files. So as to avoid various nuisances to do with calculating data I already have and storing it somewhere else and needing to keep that store up to date I keep track of the time that a file was modified by… asking the filesystem when this file was last modified.

So far so good, and tools like Blosxom do likewise, except that file timestamps tend not to be version controlled, which means if I store my files on more than one machine and rely purely on version control to maintain the date the dates don’t end up the same.

So instead I use Unison, which keeps the trees and dates in sync at the expensive of losing all my history (I also have incremental backups for a couple of months, but that’s a recent addition to my data management). As Martin Pool apparently did at some point, although that was some months before he started writing a version control tool.

Thus, plugin idea for version control system: optional version of additional filesystem metadata, especially times.

Starting out with unittests

It’s a big shame because your very first experience of unittesting is unittest.main and your next experience is this brick wall of suck and you think… I’m just not going to use it.

Andrew

Andrew and I have had Jonathan Lange to visit this week while he awaits home Internet, which means there’s been a lot of talk about unit testing.

Coincidently, today I am at the very beginning of a small-ish Python project, one just large enough that I’d like to make sure it’s fairly correct from the beginning, meaning I’d like to make sure it has automated tests, a sensible module layout and that kind of thing. It’s small enough that I bet I could get it working quickly enough without automated tests… and therein lies the trap. In order to do the right thing when starting a scratch project, doing the right thing needs to be either really really easy, or really easy to correct if it was skipped at the junkcode stage of the project.

Consider various aspects of the project. I haven’t worried too much about correct module naming, because at least with Bazaar renaming directories will be easy to do later. But I am trying to do unit testing from the very beginning because adding good testing to an existing codebase of much over 200 lines converges on impossible pretty quickly. And since I create new files in Python the way some other people create functions and some other other people create new lines, I separate my test modules early which results in this brick wall of suck:

 import unittest  tests = [ # list of strings containing test module names ]  def my_import(name): # No love to http://docs.python.org/lib/built-in-funcs.html mod = __import__(name) components = name.split('.') for comp in components[1:]: mod = getattr(mod, comp) return mod  def runAll(): runner = unittest.TextTestRunner() suite = unittest.TestSuite() for testmodulename in tests: testmodule = my_import(testmodulename) print testmodule loader = unittest.TestLoader() suite.addTests(loader.loadTestsFromModule(testmodule)) result = runner.run(suite) 

That is, unittest does not do test discovery, you have to tell it where to find all the test modules, and you run slap-bang into Python’s tedious programmatic interface to its own import mechanisms.

So, unit testing in Python: hard to get right later, but not easy enough to get right at the beginning. (Consider: the code above is currently about three quarters of the codebase in question.) I will watch Jonathan’s pyunit3k with interest. (For my PhD I use Twisted’s trial test runner, but I am not willing to introduce a dependency on Twisted for this project purely for a better test runner.)

In general, it would be excellent if some firm and right person was to write a guide to best practices for Python projects, with regard to starting a project so that it is easy to test, easy to collaborate on, easy to install and (and I think this is somewhat missing too) easy for the distributions to package. And that all these steps be so trivial that they can be carried out at the beginning of almost every project.

puzzling.org’s wacky behaviour

For ages now the puzzling.org website has had a regular bug whereby instead of displaying correctly it would dump a bunch of HTTP headers and HTML in plain text to people’s web browsers. I’ve known about it for at least a year and people have been emailing me periodically.

This should now be fixed. The problem was Twisted Web/Nevow’s implementation of HTTP’s 304 code (not modified). At least in the Twisted versions in Ubuntu Hardy and possibly still in trunk twisted.web returns a body of data as well as that header, which is an HTTP MUST NOT. (twisted.web2 looks like it gets this right, we didn’t check for sure.)

If it continues to happen and you’re one of the few and far between people who wants to help me debug the rarely used HTTP implementation that runs my website, tcpdumps (all traffic, don’t limit the packet size) are the most useful debugging tool. At least I’ll know where to look in future.

Microblogging

Tim Connors and Andrew Pollock are bothered by microblogging syndicated on Planet Linux Australia. This promises to be an absolute pile-on in the bikeshedding manner, that is, very few people are competent to comment on blog entries about SQL database underpinnings or encryption design, but microblogging is exactly the sort of thing everyone has an opinion on and shortly we’ll hear them all. I hope I’m early in the crush…

Microblogging itself varies in appeal for me as much as any other kind of blogging. I guess the highs aren’t quite so high: I’ve never seen a Twitter I wanted to bookmark. But they’re 140 characters, plenty short enough to skim even if they aren’t changing the world. I am not a huge fan of microblogging that is clearly written for either the writer themselves (unadorned having dinner working late without any attempt to write about it in a such a way as other people might want to read) or as an alternative to SMSing a significant other your plans for the evening. But most of it is about the same quality and style as the random jabs at the world people occasionally insert into IRC (in fact Andrew Bennetts should have a twitter account, but never will), so, fine.

However I too do not generally find people syndicating their microblogging to their main blog very interesting. Firstly, if I want to read your twitter feed, I’m already subscribed to it through Twitter, so having it pop up in your main blog is just two copies of the same thing. If the other microblogging sites take off enough I’ll add people to my feed reader instead. The same is usually true of del.icio.us aggregations, Pia and Jeff Waugh being something of an exception because they provide commentary aimed at readers. I certainly won’t be syndicating my del.icio.us feed any time soon, it’s entirely aimed at me and if you want to subscribe that’s your lookout.

For Planets, I suggest the solution is to add a sidebar or two for microblogging and links provided by Planet authors. This enables feed discovery and mild entertainment for people who like the microblogging, but means that people aren’t stumbling on 30 character thoughts or unadorned collections of links when they expected substantive prose. In this model, people syndicating that stuff to their main blog are required to figure out how to exclude it from what the Planet aggregates.

Unsolicited bulk email: still quite evil

Dear Google,

I am not sure how to quantify the exact amount of evil involved in unsolicited bulk email (I guess I could argue that it’s even commercial email, because you are a company promoting a product, even if it is a coding competition), but let me assure you, the amount of evil is exactly the same in 2008 as it was either time in 2005, and for that matter, in 2003.

So, knock it off already.

Dry July sponsorship

As the very model of a modern moderate drinker, you can think of me as your reasonably safe bet to back for Dry July, the Prince of Wales Hospital fundraiser in which participants gather sponsorship and do not consume alcohol for a month! (If people want wild and daring, you’re out of luck with me until Movember, sorry.)

As it happens, one of my sisters and I have both been patients of Prince of Wales Hospital during our lives (in my case, my recent compression chamber therapy for suspected decompression sickness was done there) and so this should be a matter dear to your heart as a way of ensuring the continuation of the Gardiner lines in the eons to come. And the Gardiner livers, under-abused things that they might be.

Sponsor me through my Dry July page.

iTunes U: maybe the side of the angels after all

Yesterday I posted that, per Unlocking IP, that iTunes U was only accepting content on the understanding that the university itself didn’t have the right to re-licence. Nicholas did what I didn’t though, and went to the source to find that the iTunes U licensing overview is quite a gentle friendly document instructing universities to check that their copyright is in order before distributing it and suggesting Creative Commons and GFDL as potentially appropriate licences for academic work. Nicholas also observes that universities are retaining their copyright, eg SMU.

So unless iTunes U USA and iTunes U AU are signicantly different beasts, it looks like all this is an object lesson (for me) in not citing without sighting (not trusting to a summary of anything without seeing the original documents). But good news overall, and my apologies for stupidly perpetrating confusion.

Small blessings

My laptop is only moderately recovered from a recent ‘spillage incident’, in that the arrow keys do not work very well and I need to replace the keyboard. But this cloud does have a silver lining. While the down key was completely broken, I was unable to adjust the brightness of my screen downwards (Fn+Down). During that time, my terrible (or terribly annoying) X slow crashes in which I progressively lose the mouse, the keyboard and sanity in various orders (because the main effect is the Ctrl key behaving as if it’s being held down, this last time I managed to scatter the same document all over my Desktop, since Ctrl and drag is copy in Nautilus).

And now the Down key is mostly working, I can adjust my screen brightness downwards… at the cost of needing to restart X every couple of hours. So, there you go. I might even be able to file a bug, depending on what ended up in the logs.

Content produced by academics

Via Unlocking IP comes a suggestion that the iTunes U feature (videos of university lectures, essentially) is going to be a lock-in deal where if you put your academic content on iTunes U you forgo the right to charge anything yourself for it, and thus probably forgo the right to do truly free licencing of it. (Releasing something under a licence that restricts use to non-commercial only — common in academia and on Flickr — is not really free in the sense I’m using the word.)

This sounds like another round of a very annoying saga in academia, it has its equivalent in publishing, which goes something like this. Academic articles are:

  • written by academics on their salary, stipend, wage or whatever;
  • reviewed and judged by reviewers and editors almost always donating their time (or rather, working on paid time but foregoing some of their paid research hours to judge other people’s research);
  • published (typeset, copyedited and printed) by, increasingly, for profit specialist academic publishers; and
  • bought by universities for large subscription fees back from those publishers.

This is generally considered quite a tidy little deal for the publishers, who are getting the universities to buy back their own product in a somewhat value-added form at a rate often thought to well exceed the additional value. (Incidentally, I personally am part of the additional value in a sense. I work one day a week for Macquarie University paid by money from the Association for Computational Linguistics in order to do most of the grunt work of coordinating the review process for Computational Linguistics. My job title is editorial assistant. In this case the reviewing work isn’t all donated: although the editor himself is a volunteer, I am not.)

Some movement in academia is trying to claw this back, particularly the advent of Open Access, whereby a journal is published (usually in electronic form only) and does not require a paid subscription to read it. Computer scientists are ahead of the curve, if informally: throwing PDFs up on their webpages, having OA conference proceedings. I was shocked when I went to a ‘graduate experience’ feedback meeting and someone in an experimental science was held up in their work due to not having a journal subscription, in CS we’d head over to the author’s website and download their preprint and work from that.

But just as we start to get it back apparently video lectures are becoming the new model where universities produce content for someone else to sell. I’m fairly firmly of the school where the value of universities is in producing public knowledge (this is quite controversial now: many universities and governments see universities as a sort of a behemoth commercial intellectual property production shop), and this is not the right way to go about doing that. Dear universities: don’t sell exclusive licences.

Blog carnivals

Something I’ve enjoyed a fair bit over the last couple of years is the idea of a blog carnival. You can learn what they are mostly by looking at them, but essentially the idea is that about every month a blog is chosen to host a given carnival, in which they link to a number of thematically related posts with short summaries or reactions. The one I’ve most often dropped in on is The Carnival of Feminists (Down Under Feminists Carnival has also just started up), The Festival of Frugality I’ve also seen once or twice (budgeting and frugality blogs are mostly run by people who both live in the US and have extracted themselves out from under tens of thousands of dollars of consumer debt though, so a lot of it is ‘spend less than you earn!’ advice that I’ve never had much need for).

There’s also Club Troppo’s Missing Link Daily which does actually pass through my feed reader, and from which I occasionally launch a spate of new tabs (the secret to carnivals in general is not being completionist about them, for goodness sake, people are already getting down when their reader says they aren’t up to date).

It’s not something I’ve come across in the Free Software blogs though, or at least not enough to have ever made it to a Planet. The blogcarnival site shows a couple of software development related carnivals, mostly defunct. As far as Free Software goes, the Planets are probably part of the reason, they already bring together a bunch of thematically related posts, or rather, people likely to make thematically related posts some of the time, but I still think there’d be room for a carnival of some type, for a couple of reasons: one is getting some cross-community linking, and the other is selecting individual posts of high quality.

If you’d be interesting in doing one round of hosting for such a thing, get in touch, if I get enough interest (and if I get a few non-Australians), the experiment would be worth a go.