Ada Lovelace Day profile: Karen Spärck Jones

Let’s create new role models and make sure that whenever the question “Who are the leading women in tech?” is asked, that we all have a list of candidates on the tips of our tongues… To take part All you need to do is… pick your tech heroine and then publish your blog post any time on Tuesday 24th March 2009. It doesn’t matter how new or old your blog is, what gender you are, what language you blog in, or what you normally blog about – everyone is invited.

This is a profile of a woman in technology for Ada Lovelace Day.

Creative Commons License
Karen Spärck Jones by Markus Kuhn (modifications by Mary Gardiner) is licensed under a Creative Commons Attribution 2.5 Australia License.
Based on a work at commons.wikimedia.org.

I first heard about Karen Spärck Jones, who was a senior scientist in my field of computational linguistics, in 2007 as part of my paying job, which is as the editorial assistant for Computational Linguistics. Just before she died, Spärck Jones wrote Computational Linguistics: What About the Linguistics? which we published posthumously as the Last Words column for Vol. 33, No. 3. (Spärck Jones was aware both that she was dying and that her column was going to appear under the heading ‘Last Words’.) I was never able to correspond with her directly: she died before we even had the camera ready copies done.

Spärck Jones’s academic career began in 1957, and was funded entirely by grant money until 1994: most academics will recognise this as a hard way, requiring researchers to fund their own positions with grant money awarded in cycles.

Spärck Jones was the originator of the Inverse Document Frequency measure in information retrieval (1972, A statistical interpretation of term specificity and its application in retrieval., Journal of Documentation, 28:11–21) which is nearly ubiquitously used as part of the measure of the importance of various words contained in documents when searching for information. (The word ‘the’, for example, is very unimportant, as it occurs in essentially all documents, thus having high document frequency and low inverse document frequency.) She had a long history in experimental investigations of human language (most computational linguists are now in this business). She was also at one time president of the Association for Computation Linguistics.

Awards Spärck Jones won in her lifetime include Fellowships of the American and European Artificial Intelligence societies, Fellowship of the British Academy, the ACL Lifetime Achievement Award and the Lovelace Medal of the British Computer Society.

Elsewhere: Spärck Jones’s obituary in Computational Linguistics and Wikipedia.

Ada Lovelace Day profile: Allison Randal

Let’s create new role models and make sure that whenever the question “Who are the leading women in tech?” is asked, that we all have a list of candidates on the tips of our tongues… To take part All you need to do is… pick your tech heroine and then publish your blog post any time on Tuesday 24th March 2009. It doesn’t matter how new or old your blog is, what gender you are, what language you blog in, or what you normally blog about – everyone is invited.

This is a profile of a woman in technology for Ada Lovelace Day.

Creative Commons License
Allison Randal (Three photos) by Miles Sabin, Piers Cawley, Paul Fenwick, Mary Gardiner is licensed under a Creative Commons Attribution-Share Alike 2.5 Australia License.

Allison Randal is the chief architect of the Parrot virtual machine, which, I have just now discovered, had their 1.0.0 release a week ago today. I’ve known of Parrot for a long time, because of its posited relationship with the Python programming language (see the original April Fool’s joke), but I didn’t know much about the project beyond it being a VM until Randal’s linux.conf.au 2008 talk (see slides, Ogg Theora video, Ogg Speex audio).

I am not a Perl programmer and Randal is mostly known within the Perl (and OSCON, see below) communities, but Randal’s talk at linux.conf.au 2008 was the most memorable for me: she talked about bringing modern compilation ideas to the Free Software programming languages community, and then about the architecture of Parrot and the various intermediate languages it is possible to target.

The most striking thing about Randal’s work for me is that she combined high profile technical coding with deep community involvement (and technical writing). She is a past president and current board member of the Perl Foundation and chairs the talk selection for OSCON. In an ideal world I’d like to be able to straddle technical and technical community work in my own life, and Randal is one of the highest profile examples of this I know of.

Elsewhere: Randal’s homepage, Randal’s O’Reilly Radar blog, Randal’s use.perl blog and Wikipedia.

Why microblog when you can IRC?

I’ve been meaning to answer Glyph for a while:

Maybe the reason I don’t “get” this Twitter thing is I’ve been using IRC for much the same purpose for a decade, and the UI is better.

Finally I have an excuse, that is, I wrote it up in someone else’s blog comments and can just lift it and edit it here. I have a feeling that I am cheating by using more than 140 characters though.

Advantages of microblogging over IRC:

  • it’s easier to find out where the cool kids are playing; (Do you think Stephen Fry or even Sarah Haskins would tell me where their IRC channels of choice are? Me neither.)
  • I do not sign up for a whole ‘channel’ in its entirety, with an entire social group, ongoing conversations and complicated social conventions in order to microblog; and
  • it’s not real-time. I am basically the person email was designed for. I do not real-time. Even people I talk to when forced can vouch for this.

Bonus advantage over Facebook status updates: following my microblog does not require that we are friends, or even ‘friends’. You can just read them. I do not thereby have to share my high school graduation date and pictures of my hypothetical cat with you. (That said, I am now going to create a Facebook album entitled ‘my hypothetical cat’.)

Disadvantages of microblogging over IRC:

  • yeah, the clients suck;
  • when I do get in a conversation with someone(s) I am limited to 140 character messages (plus boring any onlookers), or figuring out how to switch media; and
  • the overhead of a social group and ongoing conversations do have some benefits: I’m more likely to admire someone’s wit and insight from IRC (or their blog) than from Twitter or identi.ca.

Cousins of spam

Spam is of course unsolicited commercial email, or at least any unwanted email obvious enough that we expect a spam filter can catch it. (They seem to regard filtering unwanted mail from senders we also get wanted mail from as outside their scope, which is probably fair enough considering that it’s false positives that really drive their best-paying users up the wall.)

There are terms for the niggles that get through, especially by virtue of coming from a trusted sender: I introduced the term tofu to several people on Identica and Twitter yesterday: it’s activist spam, those (usually manually sent but massively Bcc-ed) emails asking you to attend a meeting or a rally or sign a petition. (More alert activists send these with sorry for the tofu but… at the top.) I was in turn introduced to the term bacn, for email which is solicited in the ethical email marketing sense (you’ve subscribed to it) but which then becomes something to delete or skip.

Which brings me to a couple of types of unwanted email that need their own snackfood-related term. I suppose the bacn precedent means they’re supposed to have four letters, too:

reminders to participate in something I agree is very important but that everyone knows about

One of the things I love best about compulsory voting in Australia (FAQ: how can you bear to live in a country that would gaol you for not voting? I have no idea, in my country it attracts a fine of about $20, think of it as like skipping jury duty except a lot cheaper) is not having everyone nag me by email, phone and finally door-to-door approaches to get out the vote for weeks, at best, before an election. Fund-raising can become equivalently aggressive. (By extension from compulsory voting I suppose I should argue for such high taxes that no one has to, or can, donate money to charity.)

calls for papers

Many academics, especially in computer science, receive endless emails to their individual address reminding them that the deadline for the International Conference for Synergistic E-Knowledge in the Wireless Classroom, etc, has a deadline coming up soon. They’re somewhat targeted: I get a lot more to my .edu.au addresses than to my personal ones, and computer organisations like SLUG get them too. These should fall into the category of spam, since they’re unsolicited and you have to pay for the conference, but they don’t fall under the new definition of spam, stuff that spam filters actually recognise. (Mind you, at university I think the only thing Barracuda does reliably classify as spam is Mailman held-mail summaries, but it’s especially bad at these conference mails and also at anything to do with watches.) I get several a week and by the do I want these people up against the wall when the revolution comes? metric of spaminess, it is time to have a word for this blight.

Blood donation

There are calls for blood donations after the bushfires have killed and injured so many people in Victoria. I am a long-time donor and armchair expert, so I thought I’d share some tips.

First, info for those who just need a prod:

  • you usually need an appointment to donate blood in most places in Australia. Information at the Australian Red Cross Blood Service or by calling 13 14 95 inside Australia. If they can’t get you in this week, donate anyway, severe burns victims need long-term treatment, and there will, of course, be other ill people later too.
  • the Blood Service has some information about the process itself
  • do check the health guidelines as to whether you can give blood now, or at all. If you are sick right now, wait until you have been well for a week or more.

Pressuring people indiscriminately to donate is one of my (many) pet peeves, since so many people are excluded for health reasons or due to risk factors and all it accomplishes is a pointless guilt-trip about who is or is not a better person by virtue of being able to donate. So I want to clarify: I know many cannot donate for any one of hundreds of reasons (including, by the way, phobias and so on: the Blood Service would actually prefer that you not donate if you are going to faint or need a lot of care!). I’ve been one of these people at times. If you cannot or would strongly prefer not to for any reason, go about your business with equanimity. This post is for the curious and the undecided.

If you have donated blood before and it was fairly easy, please consider a plasma donation. Plasma is a crucial product in treating burns patients and there’s generally a strong need for it at all times. It’s taken in whole blood (standard) donations too, but dedicated plasma donations supply more of it. If you have blood type AB (Rhesus negative or positive, it doesn’t matter), you are a universal plasma donor, so you are a particularly good candidate for plasma donations. It needs good veins, so if you’ve had difficulty donating whole blood before stick to that. This type of donation takes a bit longer, and is done via apheresis. You can continue being a whole blood donor periodically and also be a regular plasmapheresis donor too. There are a couple of extra health questions for a plasma donation (because you will receive some anti-coagulants), so give the Blood Service a call if you are considering donating plasma for the first time.

Now for the ones who have trouble with whole blood. I am blood group O, Rhesus negative, a universal donor for red blood cells (excluding a few rare blood types). O negative people, while generous to others, can only be transfused with O negative red blood cells ourselves, so we are desirable as donors both because our blood is useful in emergencies if there isn’t time for cross-matching, and because O negative patients specifically need it. I am also, last time I asked anyway, one of the pretty small percentage of Australian adults who has no immune response to CMV, which means I’ve never had it and I can’t give it to anyone else. (Like chicken pox, it’s a kind of herpes and the infection is permanent.) So my blood is like liquid gold, taken straight from me and rushed to O negative immunocompromised patients via a quick stop off to exclude infections, as best I can tell.

And this is the only reason I am still a blood donor. The difficulty I have giving blood would have caused me to be excluded by now otherwise, I have veins that are both buried deep and very slim. So I have some tips for others who have hard to find veins:

  • if you don’t yet drink a lot of water before donating, start doing so. A litre or so seems about right, and then another half litre or so every half hour or so if you need to maintain. You want to be over-hydrated, not just the normal good hydration that you want going about your day. (No need to go nuts, I was once advised to drink two litres down and maintain, but my kidneys started hurting in the middle of the donation! And drinking crazy amounts of water can make you seriously ill.)
  • if you have ever had them actually not get a vein at all, or not get a full donation (this is known as ‘Did Not Bleed’ in your charts, I have it 4 times), go to the donor centres (not the mobile blood units) look your interviewer in the eye and calmly tell them you want to have the senior phlebotomist do your donation, please. I have this written into my donor record now, and I often get someone who has worked with palliative care or chemotherapy patients, there is nothing a blood donor’s veins can do to them that will thwart them.

Just putting this out there

Why does the terminal beep still exist? There are approximately six quintillion laptops owned by Free Software developers, and none of their owners have ever carefully muted the sound, proceeded to do something hugely important like a non-matching search in Firefox and got the evil eye from unimportant people like the lecturer grading the course for the series of loud humiliating beeps coming from their direction? Or you all have pcspkr blacklisted or what?

 $ whois deathtobeeps.com  Whois Server Version 2.0  Domain names in the .com and .net domains can now be registered with many different competing registrars. Go to http://www.internic.net for detailed information.  No match for "DEATHTOBEEPS.COM". 

Implausible.

On girl stuff

In both of my recent talks involving women and Free Software the audience has latched onto something I didn’t expect. At OSDC it was the GNOME finding that they only got women applying for their summer of code projects once they created special ones for women. I think I expected people to have heard about that already, but they hadn’t. (GNOME had zero applications from women for Google Summer of Code, and some hundreds for the Women’s Summer Outreach variant.) There were probably a couple of things going on there aside from women responding to a specific invitation — in particular, computer science academics at some universities getting excited about being able to give their women students a specific invitation — but clearly invitations are part of what’s going on.

There is a karmic debt to do some work already incurred by giving these talks, but since the work I do isn’t Free Software and wouldn’t be generally useful if I released it as such (I know a lot of people say this about their work, but I try and predict word usage based on the opinion of the document, this really is quite niche software) and I had a reasonable idea for a variant on this kind of talk, I gave a second one anyway, at the LinuxChix miniconf. It was titled ‘Starting Your Free Software Adventure’ and happened to use women as examples. The idea was to show people what the first steps look like. I conducted (extremely short) interviews of several women involved in Free Software or Culture or their communities, including Kristen Carlson Accardi, Brenda Wallace and Terri Oda among others. (I intend to make the slides available, but since I quoted the subjects extensively and directly, it will require gathering permission and then a bit of work editing them.)

As I noted previously this talk was a failure all up, because the wrong audience turned up for it. But one thing stood out and kept coming up all week: Terri mentioning that she had resisted at times working on things perceived as ‘girl stuff’. In Free Software this includes but is not limited to documentation, usability research, community management and (somewhat unusually) sometimes management in general. The audience immediately hit on it, and it swirled around me all week.

This is a perennial problem for professional women: software development is by no means unique in having developed a hierarchy that goes from high status roles disproportionately occupied by and associated with men to somewhat lower status roles disproportionately occupied by and associated with women. (In the case of software, disproportionately occupied by women still means male dominated, by the way, at least in the English-speaking world.) It’s difficult to disentangle the extent to which women and/or their mentors and teachers self-select for the lower status roles (and I would hardly argue that the self-selection occurs in a vacuum either) versus the extent to which they are more or less barred from high status roles versus the extent to which the association is actually flipped and professions and jobs within them have become low status because women started doing them. Other well-known examples, are, for example, the concentration of women in biological sciences as opposed to, say, physics, the different specialisation choices of male and female medical doctors and surgeons, and so on. Sometimes, as in the war between sciences, the status of a field is somewhere between a joke and real, to the extent that those can be differentiated, but often it isn’t: there’s a correlation between the male to female ratio of a medical specialty and its pay.

In all of these cases, a woman who is conscious of this problem tends to face a choice. Do the ‘girl stuff’, or not? (Of course, ideally one rejects the dichotomy, but no individual woman is responsible for constructing it, and if you are sincerely of the belief that one is not programmed to a frightening and unavoidable extent by one’s social context we’re working from very different premises and don’t have a lot to say to each other.) And some, although I don’t know what proportion, of women feel guilty about their choice, especially if they do choose to do girl stuff. Just go ahead and imagine your own scare quotes from now on, by the way.

It also gets messy in various other ways. There’s the extent to which a woman who doesn’t do girl stuff is invested in maintaining the status she has chosen and also the aforementioned loop where if women are doing something, it will come to be seen as not particularly hard or noteworthy.

Most concretely, I usually see this tension bubble away underneath outreach programmes promoting computing careers (you know what, I have my own status issues and I still resist calling it IT) to women. There’s the people who want to go for yeah we all know coding is populated by weirdos, and male weirdos at that, luckily you don’t have to be a geek and you don’t have to code, phew! I tend to hear about that one only once my outreach friends have gotten involved and staged a coup, admittedly. There’s the there’s so many opportunities in computing, and yes, coding is one of them and its fulfilling and it’s something you can do, but dammit, coders get all the cred and attention and dammit can we talk about something else? Women who admin/write/test/manage rock! And there’s you know, women coders don’t exactly rule the world yet, and furthermore isn’t all this oh-yes-you-could-code-I-guess-and-that’s-a-fine-thing but look! something for folks with people skills! talk basically a soft version of ew coding that’s for boys, also, last I checked, math is hard?

I observe again that there’s no right answer here in the real world right now. Women doing girl stuff have good reasons to feel dissatisfied that their hard-won skills are underpaid and under-respected, women doing boy stuff (scare quotes! please insert!) want other women to know that there’s fun to be had over here, thank you.

One crucial point in my thoughts about this I stumbled on only after the conversation Brianna Laugher recounts, over Indian on the Friday night (the location of all major conference breakthroughs worldwide). She said — paraphrased — that she didn’t feel that she should have a problem or be criticised for doing what she is good at, or what’s so desperately needed in her communities, and have to be just another coder in order to be fully respected. And I said that while this was certainly true, women also need to have the opportunity, to give themselves the opportunity, to be selfish: if they want to code, or do something else they are currently either bad at or not notably good at, or for that matter which they are good at but in which they’d have competitors, they should consider doing that, rather than automatically looking for and filling the space that is most obviously empty.

I had a brief, but related conversation with Jeff Waugh at the Professional Delegates Networking Session — an attempt to formally recreate the Indian diner breakthrough environment —  at which he commented that he continued to find the invitation culture (the same one I discussed in my OSDC talk) of women in Free Software mystifying and frustrating. (Not his exact words, if you have better adjectives Jeff let me know.) I took that one somewhere else: specifically to invitation cultures outside Anglo culture and then to honorific use in the Korean language, but when considering the question of women I think this is intertwined with the be selfish thing: women are reluctant to enter places where they aren’t obviously welcomed, and what better way to be welcomed than to do work that needs doing and not become just another person doing the coding free-for-all and delaying external validation for potentially quite a long time?

I have no answers. Just the perennial question of distinguishing what other people want, what other people claim they want, the genuine satisfaction of being of service to someone, and the genuine satisfaction of knowing you’ve done a good job of something hard. Take a look at where you’re standing on that one occasionally.

Mario Kart

We have finally come up with a system for playing Mario Kart semi-regularly online. So, if you own a copy of Mario Kart (for the Wii) and want to play now would be a good time to send me your codes and let me know to tell you when the game is being played.

linux.conf.au 2009: miniconferences

linux.conf.au 2009 was held in Hobart from January 19 to 24.

After two years (co-)running the LinuxChix miniconf I was glad to not be tied to the room the whole day on Monday. My talk was first up though, so into the room I went. The talk was a failure as far as my primary aim with it went: the idea was to inspire newcomers with stories of existing contributors (all women, given the context) stories of getting involved. The reason this failed is that only the hardcore faithful attending: it wasn’t a talk intended to preach to the choir in that way. I came up with the idea after hearing about the FOSSCoach event at OSCON 2008: I even thought about proposing a whole FOSSCoach miniconference before I remembered that I wanted to have less major timesinks.

There is no video recording of my talk either unfortunately, I will make audio available fairly shortly assuming that the audio that comes off Andrew’s mobile phone is at all passable.

I went to the panel on geek parenting after morning tea: this was very popular and perhaps deserves a better forum in future. I’m hoping to get some audience write-ups of this. I then went to half of Matthew Garrett‘s talk How I Learned To Stop Worrying And Love ACPI, partly because I’d recommended him as a good speaker to Sara and then ran into Matthew very shortly before his talk, and he casually mentioned something about how he was about to write the slides. So I had to check that I had not led Sara astray: luckily not, if only because the structure of the talk was along the lines of ask Matthew a question about something that makes him angry and wait and learn..

The afternoon of the LinuxChix miniconf was sunny and informal: there was a wrap-up session about evangelising IT to girls and then Robyn had a short piece advertising the existence of ChixBits and hoping to get some contributors.

Tuesday’s programme was generally more exciting for me. I went to much of Brianna Laugher‘s Free as in Freedom miniconf. Matthew Landauer repeated his OSDC talk on Open Australia (our version of They Work For You). It’s a cool project and approachable from my point of view: screenscraping and such. If I was taking on new projects I’d probably send patches.

Over at sysadmin for once in my life, I went to Gus Lees’ talk on Google and ipv6. Essentially from Google’s point of view ipv6 will arrive sooner or later and they want to make sure their (quite strict) internal SLAs are met when they start serving AAAA records for www.google.com. So they have some analysis of how many people will use AAAA records (about 0.7% of web users if I recall) and how many of them then have broken routing somehow (about one-third of the aforementioned 0.7% of web users). Then there’s the folks with crazily long routes for no good reason and so on. The upshot is similar to Google’s blog: ipv6 is moving inside Google. If you (as a network admin) are interested in testing, see here. Gus is at the other end of that email address and his home was the first DNS server to get access to AAAA records for www.google.com.

Jeff Waugh did a historical analogy between printing presses as revolution and Free as in revolution. Rusty Russell gave a talk which he hated on principle — it wasn’t about code —  but which was beneficial to his audience, if not to any actual code. Its main point was that those arguing against stronger intellectual property is not an argument for strong property rights of the type that are important to capitalism, it’s arguing against them. People who own a copy of a book, movie, or computer programme under strong intellectual property own less of that copy. This is dear to Rusty’s heart: property rights are important. If it wasn’t that he disclaimed all intent to ever do a ‘soft’ talk (ie no code) again, I’d recommend hearing it from the man with the passion.

Rusty’s talk ended in his intellectual property interpretive dance, of which, like many linux.conf.au shenanigans, there is surprisingly little evidence on the Web.

RAID is not a backup solution, times one million

Via slashdot.org (yes really, I still pull in the headlines, although the miracle of feed readers has allowed me to confirm that yes, Ars Technica is a better read), a site called Journal Space, which hosted weblogs, lost all their data. They only had a RAID setup as backup, that is, a system that mirrors content between two disks and is designed to protect against disk failure. If you’ve heard of RAID, you hopefully already know that it is not the same as a backup: if software error or an accident or a malicious act deletes data from one disk, the RAID setup faithfully mirrors it to the other disk. If not, imagine that you have two magical whiteboards. One is copied exactly to the other. If one magical whiteboard totally breaks down, excellent, you have a full copy of your meeting notes and doodles on the other. (Note for accuracy, not all RAID configurations produce a full mirror and sometimes the mirror is spread over more than one spare disk. But you get the idea.) However, if someone rubs something off the whiteboard, or falls over while holding a can of solvent and splashes it on the first whiteboard, everything on it is immediately deleted from the other.

Instead, for home machines you want, most likely, an incremental backup, that is, a separate disk/machine with several copies of your data going back in time. Your data as it was an hour ago. Your data as it was a day ago. Your data as it was a month ago. And so on. I have snapshots of my data for every three hours over the last two months. (Sensible backup programs will notice when data is the same across two or more time periods and only store it once, so your backup disk does not need to be so very much larger than your normal disk.)

For business systems you want both: the quick recovery from disk failure that mirroring systems such as RAID offer, and incremental backups. (I don’t maintain business grade systems, ask someone else for best practices if you need them. Internally consistent database backups are something you want to pay particular attention to.)

I note this because in November I gave a talk on home backups for Linux at SLUG and there is one other point of interest: do not trust third party providers to have good backups. It is getting increasingly common to have a lot of your most interesting data on someone else’s servers: your email on Google’s, your blog over at wordpress.com, contact details for all your friends on Facebook, and so on. But your provider can make both their own catastrophically bad decisions, like Journal Space, and have their creditors suddenly sell their hard disks off in a fire sale, as happened to Digital Railroad.

Which is a big problem, because a lot of third party providers do not provide an easy way to get your data (‘easy’ would be both a documented API accessible from common programming languages and an installable application), and lots don’t provide any way at all. (There’s also a whole batch of interesting issues to do with your comments or Wall postings or whatever: you don’t necessarily have the right to reproduce them and there would be privacy implications when allowing you to back them up and reproduce them on some other side. LiveJournal, for one, solves this problem by not allowing easy backups of comments left on your journal.)

If your email host, blog host, calendar host, documents host or social networking host failed or deleted your account, how would you fare?