Remembering Malcolm Tredinnick

I flew home from the US yesterday and when I arrived in Sydney I got a message from my husband saying that Malcolm Tredinnick had died. According to this piece by Simon Dulhunty, he was found on Monday to died at home in Sydney, possibly after a seizure, while I was at PyCon 2013.

Malcolm Tredinnick speaking to an audience
Malcolm Tredinnick speaking at DjangoCon 2008 (by Sebastian Hilling CC BY-NC)

I’ve known Malcolm slightly since my first linux.conf.au in Sydney 2001. In late 2004 I interviewed for a job at CommSecure (since closed) where he was then working, having been a lead developer of and continuing to maintain and develop a real-time data delivery system for the Hong Kong stock exchange. (The eventual end of that contract was the reason CommSecure later closed.) He was also my boss for about half of 2005 until I left to begin my PhD in early 2006.

I still caught up with him at technical events, the last long conversation I remember with him was at PyCon AU 2011 where my husband Andrew and I had a very Malcolm conversation with Malcolm, which roved over the paperwork hassles of having no fixed address (Malcolm travelled a lot and went through periods where he housesat or lived in serviced apartments for a while), the Australasian chess community, and some gentle mutual trolling between him and Andrew over narrative testing.

What I will remember most about Malcolm is that he was a teacher at heart. I never personally had this relationship with him, but I knew several people at CommSecure and elsewhere who Malcolm had tutored or mentored in programming, often over a very long period of time. Elsewhere I know he had taught mathematics (long before I knew him, he very nearly completed a PhD in mathematics when his area suddenly became fashionable and about 50 years of work was done in 6 months by incoming mathematicians) and chess. I will also remember his dry and sadonic approach to nearly everything (for a very recent example, Malcolm gives useful parenting advice), combined with “really, how hard could it be?” used both straightforwardly and distinctly otherwise. Goodbye Malcolm.

Update, funeral plans: Ray Loyzaga who was Malcolm’s close friend, and long-time founder-CEO of CommSecure, has announced that Malcolm’s funeral will be at 2:30pm Thursday April 4, at Camellia Chapel, Macquarie Park Cemetary, North Ryde, Sydney.

Other memorials:

Malcolm online:

Creative Commons License
Remembering Malcolm Tredinnick by Mary Gardiner is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Citation delusions: "The most influential paper Gerard Salton never wrote"

In trying to finalise my PhD revisions, I am giving some background on text categorisation.

Extremely briefly, the problem of text categorisation is this: you have a document and some (usually pre-defined, unless you’re clustering) categories. For example, the categories might be news and editorial. Or academic article, newspaper article and blog entry. The choice of categories is application dependent.

Then you have a document you wish to assign to a category. Is it news, or editorial? The typical way of doing this is to assemble a set of training examples: pre-assigned news and editorial pieces. Then you measure the similarity of your new document to the pre-assigned collections, and whichever category it is most like is your document’s category. You might notice that I have not here defined “measure the similarity” and “most like”: that’s often the research question. How can you represent the collections efficiently so that they can be compared against new documents? What are good measures of similarity?

A fairly common way to picture this is (for historical reasons, as we’ll see), a vector. For each word in the vocabulary (the vocabulary being the set of terms used in every document in the training examples, typically, sometimes you might try and smooth the morphology out or similar), you construct a numerical representation. Say the vocabulary is no-good, bad, rotten, and a document reads “no-good no-good bad”, you might describe it as a vector , showing two uses of the first vocabulary item, 1 of the second and none of the third. (Again, whether you count vocabulary items, or weight them in various ways, is a research question. You may also notice that this counting-of-occurences model is a “bag of words” approach, that is, it does not distinguish between “bad rotten” and “rotten bad” even though in language word order and syntactic structure is meaningful. It’s possible to transform the vectors so that this orthogonality of individual words does not hold.)

For reasons that I won’t go into here, I am trying to discuss this model briefly in my PhD thesis — actually, more briefly than I did above — and therefore looking to cite the originator of the idea. I started coming across citations in other papers that looked something like: “Gerard Salton [and others] (1975). A vector space model for information retrieval.” Sounds good. It’s got the key words in it, and quite a few citations!

I like to sight before citing though, which means I found this interesting paper:

David Dubin (2004). The Most Influential Paper Gerard Salton Never Wrote, Library Trends 52(4):748–764.

Gerard Salton is often credited with developing the vector space model (VSM) for information retrieval (IR). Citations to Salton give the impression that the VSM must have been articulated as an IR model sometime between 1970 and 1975. However, the VSM as it is understood today evolved over a longer time period than is usually acknowledged, and an articulation of the model and its assumptions did not appear in print until several years after those assumptions had been criticized and alternative models proposed. An often cited overview paper titled “A Vector Space Model for Information Retrieval” (alleged to have been published in 1975) does not exist, and citations to it represent a confusion of two 1975 articles, neither of which were overviews of the VSM as a model of information retrieval. Until the late 1970s, Salton did not present vector spaces as models of IR generally but rather as models of specific computations. Citations to the phantom paper reflect an apparently widely held misconception that the operational features and explanatory devices now associated with the VSM must have been introduced at the same time it was first proposed as an IR model.

Naturally such a subtle treatment of the history of the model is not great for my immediate purposes: I need That One Citation! (As best I can tell from Dubin, if I have to pick one it should be G. Salton, (1979). Mathematics and information retrieval. Journal of Documentation, 35(1), 1–29.) but it’s fun to come across the analysis of an idea in this form.

Update: if you want a reasonable overview of text classification/topic classification/topic assignment, the survey of choice seems to be Fabrizio Sebastiani (2002). Machine learning in automated text categorization, ACM Computing Surveys, 34(1):1–47. You know, modulo 11 years now.

Kin: the director's commentary

Some additional or rephrased thoughts on kinship, now that I’m not working to a deadline, nor as tired.

Family of origin. There’s luck and privilege involved here, both in having a family of origin you want to have a relationship with, and that wants to have a relationship with you. When there’s choice about it, it’s most often only available to legal adults, at least when it comes to a relationship with one’s parents. I choose to have strong, central relationships with my family of origin, this is active work but work I can only do due to luck and privilege.

Family of choice. There’s some more freedom here, but not everyone goes into life or even into adulthood with the skills, time, energy, mindset etc to form strong bonds with unrelated people. I’ve reached 30 without really doing enough work here.

Children. In talking about the legal relationship, I’m assuming a fair bit more luck and privilege: that is, reproductive control. In that case, one is not legally required to have children in the same way that children must have parents/guardians. In either case, one’s relationship with one’s children is in an interesting place as something like family of origin (traditionally endorsed, legally recognised[1], tightly societally scripted) and like something else: a very intimate relationship with someone you don’t know at all and who will change quite a lot, very quickly. And in some circumstances, one can choose this relationship without any input from the other party (unlike being assigned your parents, in which one usually has no choice, or being assigned your siblings, in which neither sibling typically has any choice). It’s a very odd thing.

So for me my relationship with my child is somewhat outside the “family of origin”/”family of choice” narratives that I’d at least had a long time to consider. It’s not the only relationship one could have that has these features, but it’s the major one I’ve had, and I don’t know that I can write well about it until the story has played out more.

[1] Some luck and privilege here!

Proving Australian citizenship when parents are born after 1986

Update January 2018: the Department of Home Affairs provides a Evidence of Australian Citizenship document to those whose claim to Australian citizenship is not one of the more obvious routes.

I know people who’ve had trouble with this, and won’t tell their specific stories, but a note now that I am applying for a passport for my son.

Australian citizenship is no longer by right of birth: if you were born on or after 20 August 1986, there are various more complicated ways it is acquired. One is being born in Australia and having at least one parent who is an Australian citizen or permanent resident at the time of your birth. Therefore, my son’s passport application has this section:

Please mark which of these documents you will provide at interview (you must present the original):
to prove that the child is an Australian citizen or to prove that one of the child’s parents was either an Australian citizen or a permanent resident of Australia at the time of the child’s birth.

  • The child’s Australian passport issued on/after 01/01/2000 and valid at least two years
  • One parent’s full Australian birth certificate (parent born prior to 20/08/86)
  • One parent’s Australian passport (issued on/after 20/08/86, valid two years)
  • One parent’s Australian citizenship certificate
  • One parent’s Australian permanent resident status

Both my husband and I were born in Australia before 1986, and in addition we happened to hold Australian passports before our son was born in any case, so we have a surfeit of evidence that will satisfy them that he’s a citizen.

But people born in, say, September 1986 are turning 27 this year, and therefore there’s lots of parents and soon-to-be parents whose Australian birth certificates are not sufficient proof of their own Australian citizenship, let alone their child’s status. So it’s very easy to imagine a situation existing right now where someone will need to show up with a grandparent’s birth certificate in addition to a parent’s and their own, and so on. By 2040 or earlier a great-grandparent may be required if no one in the family has held passports in intervening generations (great-grandparent born before August 1986, has a child at age 18+ in 2004, a grandchild in 2022 and a great-grandchild in 2040).

The easiest way around this for citizen-parents seems to be making sure one holds an Australian passport — because it stands on its own, unlike post-1986 birth certificates — before a child’s birth, which is not really foremost in one’s mind at the time. Oh, and be sure to keep it in a safe place until you or any of your children need evidence of their citizenship (usually but not always when they first need a passport themselves), because without it they’ll be back in the same documentation pickle. An increasing number of Australian-born people are going to have to go through the prior process of assembling potentially burdensome proof of citizenship involving either a string of ancestral birth certificates*, or a bunch of evidence of Australian residence on their 10th birthday (see Table B in documenting citizenship). I’d have trouble now proving my own Australian residence at age 10, frankly.

Australia is far from alone in not awarding citizenship by right of birth alone, so I assume there’s either a lot of people around the world who struggle to get passports, or other countries have processes that are more mature and less reliant on finding an ancestor who was unconditionally a citizen.

* Or worse, trying to get hold of 40 year old evidence of someone’s permanent residency, which I suspect is not as available for purchase later on as birth certificates are.

Product review updates: Shoeboxed & GoGet

I had some success in 2012 at subscribing to things that made my life a bit easier to organise, so, a couple of updated reviews.

Shoeboxed (original review)

Update February 2017: this service is now known as Squirrel Street, and their smallest monthly pricing is significantly higher than it was in 2013. However much of the review still applies.

Original 2013 update:

What: a service where you package up a bundle of papers to be scanned, and they scan them, do some basic data entry (vendor, date, total amount, total GST) and store them on their website for you.

Current impressions: it’s still a pretty good fit for our needs: whenever a piece of paper enters our house that we have any belief we may need to access for paperwork purposes, we ship it off to them for scanning, data entry and shredding. The big test was doing our 2011/2012 taxes, and it was great to just enter a search term and have the document we needed show up among the top hits. We’ll keep using it for the foreseeable future. We don’t even really need the numerical amounts entered, since we don’t do personal bookkeeping at anything like that level.

I’ve also started forwarding them PDF receipts I get in the mail, and those work well: the PDF is pulled out and added to the data entry queue the vast bulk of the time. They’re much less good with HTML/text email receipts; it’s a harder problem though.

The major downside that has emerged is the length of time the processing takes, at least on the entry-level plan that we are on. It takes about two weeks from popping the envelope into the mail to the scans being available, and the delay is the scanning itself, not the data entry, so we can’t even access the raw images during this period. (There’s two ways to tell: one is that data entry for documents we upload in electronic form is usually complete within hours, the other is that the scans eventually show up in our “uploaded documents” queue waiting for their own data entry, and that happens about 24 hours before we get the “envelope processing now complete!” email.)

This is slower than the pricing plan states. It is mostly annoying for my business receipts: I do do double-entry bookkeeping for those, and in order to stay on top of things I like to do bank reconciliations sooner than 2 to 3 weeks after spending the money. I expect though that most businesses would subscribe to one of the higher volume plans (ours is 50 scans a month) which also have faster turnaround times.

GoGet car sharing (original review)

This has been a great replacement for car ownership, for us. Neither of us commutes by car (it would be a thoroughly silly way to pay for a regular commute), and we don’t even use cars every single weekend. But we do travel a lot to places where it is either essential or nice to have a car for the weekend, and make shorter trips to places that are a pain to wrangle a young child, associated supplies, and ourselves to on public transport (eg, Sydney’s beaches).

It’s also nice to have access to the vans. I’ve only done amateur furniture removal once this way, but they’re nice and roomy (we got two couches and a double mattress into one trip) without being as difficult to drive as the trucks one gets from rental companies. Also potentially much cheaper for small things, to be hiring by the hour!

For whatever reason, the contention for them has not been as bad since around about April. We can almost always get our first or second choice of car with as little as an hours’ notice. This is excepting the local iMax (8-seater) which you have to book up to 6 weeks in advance, but we very rarely need an 8-seater, luckily. We also regularly are later than we planned to be, and only once have I had to hurry back because someone else had booked the car for the next hour: every single other time we’ve been able to extend the booking into the free next hour. Several more cars have been added to the neighbourhood since around then.

We’re getting used to the child car-seat issue. It helps a lot that one of the nearby cars now has a car seat in it. We still often have to fit or re-fit the seat; I now believe the commonly cited statistic that around about 70% of self-fittings are incorrect. Ours definitely aren’t as tight as a professional fit sadly, but at least unlike everyone else we don’t have the back of the child’s belts wrapped around the adult belt that holds the seat itself. However, fitting a seat is a lot less onerous than carrying a seat to the car (while persuading a toddler to walk with us) and then fitting it! It will be good to have him in a booster though.

It’s not especially cheap: our monthly spend is somewhere between $200 and $500 (the high end in months like December and January, with multiple visits to different family in different cities). And we’re definitely using cars more often than we would if we had to sort out an entire car hire from scratch from a daily company every single time.

If there was one feature I really wish they’d add, it would be the ability to conditionally cancel a booking. The present situation is this: if you cancel with 48 hours before the start of the booking, it’s cancelled and you do not pay anything and the car is available for someone else to re-book. After that, you simply cannot cancel (not even any portion of your booking that is more than 48 hours in the future). What I’d like is the ability to do something like cancel at any time, thereby having the car available for booking by someone else, and, if there was less than 48 hours’ notice, incur the difference between my original hourly fee and any hourly fees they were able to get from any new bookings for that car. Then they have the same situation as now with regard to not losing my booking fee, but the neighbourhood is not locked out of the unused car for the duration of my abandoned booking. We felt this keenly when we had to walk away from our entire Easter weekend trip at the last minute due to acute illness.

We don’t intend to purchase a car again any time soon.

How to do more writing, by someone who has never made any such resolution

Jonathan Lange asked on Google+ for ideas about keeping a “write more” resolution. I took over his comment section, and in the spirit of taking some of my own advice, here’s a synthesis of what I said there. Since not writing as much as I feel I ought is never a problem I’ve had, this advice is in the delightful genre of someone who has never needed the advice simply making some up and giving it to you anyway! Enjoy my half-baked ideas.

Re-use your writing. A lot of people I know spend an enormous amount of time on crafting lengthy, tightly argued emails. These count, and you can make them feel like they count by editing them for a sufficiently general audience and publishing them on your blog. This is one I actually do do: several of my Geek Feminism pieces originated in annoyed private emails I sent to close friends, or in IRC rants.

Accountability and incentives. This is like all of the “how to exercise more” advice: make it public, make it social. Make a public commitment, make a shared commitment with a fellow writer. Have a competition, one-sided or not (“I will write more blog entries than N will this year”?). Deadlines and someone who will be personally disappointed in you can be an excellent motivator (as long as it doesn’t tip you over into an avoidance cycle), and for writing there’s a whole profession which involves, in part, holding people to deadlines and being disappointed if they fail to meet them: so, find an editor.

Unfortunately, in order to get an editor one generally needs to pitch (leaving aside the whole question of finding an agent, especially when it comes to fiction), which means writing, so you will have to be motivated to do some writing before you can partially outsource your motivation to editors and deadlines.

Becoming a freelancer seems like a big effort in order to fulfil a personal goal to “write more”, but part of the attraction is that you can pitch to places that have a ready-made audience, which means that you have outsourced any implicit “write more in places people will read it and find it useful” goal; you don’t need to put an equal or greater amount of work into building an audience for your writing.

Specific goals. This assists with accountability. What does writing more mean? A certain wordcount? A certain number of blog entries? A certain number of pitches sent out? A certain number of pitches converted to published articles? All of these are more artificial but easier to keep accounts of than “write more”.

Spend money. Enrol in a course or similar. This adds deadlines too, typically.

Creative Commons License
How to do more writing, by someone who has never made any such resolution by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Why is someone’s entire adult life relevant to their job application?

This article originally appeared on Geek Feminism.

Over at Captain Awkward’s advice column, there’s a question about how to deal with a recent name change when potential employees may call references that know you by a former name. The advice moves a little into how to deal with “resume gaps” in general:

Prospective employers will ask difficult questions about gaps in employment, changes of field, etc., but often they are doing it because they want to see how you react to the question before they decide if it is an actual issue. They want to make sure that you didn’t lie on your resume. They want to see if you have a coherent reason for whatever it is. And they want to see if you react with grace under pressure, or if you turn into a defensive weirdo… [P]lenty of people take time out of the workforce to care for kids, go to school, look after aging relatives, etc. and then are in the position of trying to get back into the workforce. If an employer is going to hold your years as a caregiver or student against you in making a hiring decision, that is their bad. Do not apologize! Do not talk about how your skills are “rusty”! If they say “I notice it’s been a few years since you’ve been working in this field, what’s up with that?” say “Yes, I was lucky enough to be able to take some time off to care for my mom at the end of her life,” or “Given the cost of day care, it made sense for one of us to stay home with the kids for a while” or “Yes, it was strange to be a grad student-by-day, bartender-by-night, but my customers were great and I learned a lot from having such a public-oriented position” and then ask a question about the position at hand.

It’s possible to disagree for pragmatic reasons with the advice to disclose here (see for example annalee’s comment on that post), but I wanted to move away from the question of what individual jobseekers should do — to be clear: I don’t fault Captain Awkward discussing that, it’s an advice column! — to the general question of why this comes up. Why do resume gaps matter, exactly? Why is a job candidate who has several unexplained years on their resume a worse candidate for a job?

Here’s my hunch about why it matters: because it’s a proxy for discriminating against (former or currently) ill or disabled people and carers, pretty much. And people with a history of institutionalisation, and others. So at an individual level you can disclose on the principle that while it sucks that there are powerful bigoted people out there, it’s better to find out that they’re bigoted against you before you’re working for them. Or you can not disclose on the principle that while it sucks that there are powerful bigoted people out there, you might be able to stay mostly under their radar when you are working for them. Not the most excellent choice in the world!

This seems in some ways hackable to me. This isn’t a new insight, but part of the problem with hiring is the need to choose one person (or N people), and, typically, having more than N applicants. You need some tools to eliminate people, so people come up with petty absolutes about resumes that are in the wrong font, or are one page long, or aren’t one page long, or that cover letters that use “I am writing to apply for” rather than “I am applying for” or whatever you like. And of course it’s easy to fall into bigotry too. The ideal worker bee is young and male and “flexible” and so on. If society has squashed someone down by keeping them out of the workforce, you don’t want your organization to have to pay the price for the squashing, so let’s require an age-21-to-present-time employment history too. Some people have that, after all.

There’s a real problem with resume gaps, which is that they might be actually relevant time that the person doesn’t want to talk about with you (for example, the employer they defrauded), but I think it’s at least worth questioning the idea of pushing down on everyone who has ever been out of the workforce in order to find them, and there’s definitely also a desire to ferret out “flakes” (people who you want to discriminate against) among some employers.

One possibility then is that by consciously letting go of the idea that your hiring skills guarantee getting the single best hire, or the belief that your resume filtering skills and interviewing skills are helping you past a certain point, and choosing randomly from the best M applicants as selected by your hopefully-consciously-avoiding-bigotry hiring process. And by letting go of your belief that you need total control in order to select The One, perhaps you can let go of at least some received wisdom about seeing “red flags” in any sign that someone may have done something with their weekdays other than work, and that they may not want to talk to you about that.

What received truths of hiring do you think are bogus or discriminatory?

Life at 7: discussion thread

This article originally appeared on Hoyden About Town.

In February last year, the ABC screened Life at 5, the third edition in Life, an Australian documentary series following children born in 2004/2005 through their childhood. It’s associated with Growing Up in Australia: The Longitudinal Study of Australian Children.

Almost all of the children we first met at age 1, and then at ages 3 and 5, are returning from tonight in Life at 7, with only Loulou not appearing. This time, the two documentaries are Tackling Temperament (now on iView) and Finding Your Tribe (now on iView), screening a week apart.

People with Australian IP addresses can also catch up on the earlier documentaries for a limited time:

Are you watching Life at 7? Please play along in comments, I enjoyed (and was frustrated by) the previous documentaries, and I’m looking forward to seeing the new series.

Ada Lovelace Day: Marita Cheng, Robogals founder

Today, October 16, is Ada Lovelace Day: write or record a story about a woman in science, technology, mathematics or engineering (STEM) whose achievements you admire.
This is a slightly updated version of a profile that has appeared on Geek Feminism and Hoyden About Town.
Marita Cheng was named as the Young Australian of the Year winner at the beginning of the year. She’s been involved in volunteering since she was a high school student, and in 2008, early in her undergraduate studies (mechatronic engineering and computer science at the University of Melbourne) she founded Robogals, which is an engineering and computing outreach group, in which women university students run robotics workshops for high school age girls.

Marita, while still in the final year of her undergraduate degree, is also an entrepreneur and has been previously awarded for her work as founder of Robogals, including winning the Anita Borg Change Agent award in 2011. In 2012 she travelled to several countries with the aid of the Nancy Fairfax Churchill Fellowship to study “strategies used to most effectively engage female schoolgirls in science, engineering and technology.”

While I have heard of Robogals, I hadn’t heard of Marita specifically before she became Young Australian of the Year. One of the fascinating things about starting the Ada Initiative is slowly discovering all the other amazing women who work in technology career outreach and related endeavours. But it’s a little embarrassing, judging from her bio, to have not heard Marita Cheng’s name before the beginning of the year!

Further reading:

  • Marita Cheng’s website
  • Life is turbocharged for Robogals founder (a profile this past weekend)
  • Creative Commons License
    Ada Lovelace Day: Marita Cheng, Robogals founder by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

    Ada Lovelace Day: Else Shepherd, leading Australian electrical engineer

    Today, October 16, is Ada Lovelace Day: write or record a story about a woman in science, technology, mathematics or engineering (STEM) whose achievements you admire.

    Else Shepherd is an Australian electrical engineer specialising in communications equipment. She has co-founded multiple Australian engineering companies, including Mosaic Information Technology, a custom modems company, and Microwave & Materials Designs, developing microwave filters for mobile phones. She was appointed as the chairman of Powerlink, the state government-owned corporation maintaining Queensland’s high voltage electricity grid, in 1994, and has been a board member of the National Electricity Market Management Company (now known as the Australian Energy Market Operator).

    Shepherd won Engineers Australia’s Peter Nicol Russell Memorial Medal in 2007, their most prestigious award, recognising an engineer with over 20 years of substantial contributions to professional engineering in Australia. As best I can tell, she is the only woman Peter Nicol Russell medallist. She is also a Member of the Order of Australia since 2003, and was the University of Queensland Alumnus of the Year in 2009. She is also a pianist and choral director.

    Shepherd has talked about her experience as a woman in electrical engineering with University of Queensland publications. She and one other woman graduated in 1965, the university’s first women graduates in electrical engineering. She was unable to attend Institution of Engineers meetings in the 1960s, because they were held at the local Men’s Club. She continues to promote workplace flexibility, having used part-time work during parts of her career to care for her two children.

    Further reading:

    Creative Commons License
    Ada Lovelace Day: Else Shepherd, leading Australian electrical engineer by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.