Science – puzzling.org

May 6, 2014January 16, 2017

Not the Sydney Project: Questacon

This entry is part 3 of 11 in the series The Sydney Project

This year is my sonÂ’s last year before he begins full time schooling in 2015. Welcome to our year of child-focussed activities in SydneyÂ… only without the Sydney bit this once.

We rudely interrupt the Sydney Project to bring you a Canberra attraction: Questacon. In short, Questacon works nicely for V in a way that the Powerhouse did not, probably because it’s pretty shameless about catering entirely to children, complete with buttons, lights and hard hats.

We were there on a very busy day: the Saturday of the Easter weekend, the middle weekend of NSW school holidays. It was merely obnoxiously busy; I guess being used to Sydney crowds was helpful. That said, we did get there at 9:15, just after it opened. And as it was, the admission tickets to Mini-Q, the under 6 area, which is in limited numbers sessions on busy days, were only available from 11:30 onwards. I think they’d completely gone by about 10:30 in the morning. Go early, go often.

We’ve been once before, about a year ago, and Questacon was a hit to the point where for some time afterwards he asked to “see the science again!”. It took him longer to warm up to it this time. Much like last time, he shot through Measure Island without engaging. It took him a while to settle into Wonderworks, eventually getting interested in the Energy Machine and Frozen Shadow. Much to my disappointment, he’s never given a toss for my precious Harmonograph. (Much of Wonderworks has been there since I was a kid. Questacon’s exhibits are surprisingly timeless in their appeal.)

Andrew and I and his father were very taken with the Cloud Chamber, which is in its own little-visited room from the steps between Wonderworks and Awesome Earth (closed for renovations), in which subatomic particles leave continuous trails through a cloud of vaporised alcohol. Andrew is keen to bring a banana next time. V was not willing to stand still for a story about how all the time, everything is being hit with tiny tiny particles moving at high speeds. Perhaps not one for the littlies.

V’s favourite exhibit is pretty unique to him. He can roll ping pong balls down a ball rollercoaster for about an hour at a time. Other children come, roll five or ten balls and go. He stays. We only extracted him with a promise to return after.

Next up was one for the watching adults, Excite@Q. V was most naturally drawn to the blue tunnel, and he was one of several smaller children jostling under Whoosh to grab a scarf and stuff it back in the wind tunnels. But we were there for one thing: to see our four year old agree to do Free Fall. I wrote about this elsewhere:

It’s a horizontal bar suspended over a very steep slide. You hold the bar. You let go. You drop freely for three metres or so before hitting the slide and sliding to the floor of the room.

The ride is, as you’d hope, very into consent. You go to the top. You get a briefing about how it works. You are told, repeatedly, that it’s OK to say no. And the day we were there, about three quarters of children did say no. (It’s a bit of a study in gender performance actually. Adult men by and large grab the bar, drop themselves down to dangle, let go and are done. Everyone else takes far far longer.)

V loves slides and heights, and so we asked him if he wanted a go. He said yes. He was dressed in the safe suit for it (I guess no risk of catches or tears), he waited in the queue and watched child after child look at the drop and shake their head and walk back down the stairs with an adult for a hug. Andrew took him to the top. He got the chat about whether he wanted to say no. He gave them a puzzled look. He got his instructions. He took them very seriously.

He held the bar:

He dropped his weight from it:

He looked down:

And he let go:

He seemed to have fun, if a mystified about why this was such a very big deal.

After Free Fall, his ticketed time for Mini-Q came up. I didn’t go in, but apparently it was all construction all the time in there.

Finally, for bonus points, I put my camera down somewhere in Wonderworks, and someone found it and handed it into staff. “People who come to Questacon are generally very honest,” the information desk staffer told me, although somewhat spoiling the effect by saying she’d been tempted to keep the camera herself.

Cost: $23 adults, $17.50 children 4 and over, younger children free.

Recommended: yes, has something for the jaded adult radioactivity fans and the child who wants to drop from extraordinary heights, wear a hard hat in a playground, and roll ping pong balls down a slide for an hour alike. Try not to go on holiday weekends, and try not to leave your camera lying around.

More information: Questacon website.

February 15, 2013

Citation delusions: "The most influential paper Gerard Salton never wrote"

In trying to finalise my PhD revisions, I am giving some background on text categorisation.

Extremely briefly, the problem of text categorisation is this: you have a document and some (usually pre-defined, unless you’re clustering) categories. For example, the categories might be news and editorial. Or academic article, newspaper article and blog entry. The choice of categories is application dependent.

Then you have a document you wish to assign to a category. Is it news, or editorial? The typical way of doing this is to assemble a set of training examples: pre-assigned news and editorial pieces. Then you measure the similarity of your new document to the pre-assigned collections, and whichever category it is most like is your document’s category. You might notice that I have not here defined “measure the similarity” and “most like”: that’s often the research question. How can you represent the collections efficiently so that they can be compared against new documents? What are good measures of similarity?

A fairly common way to picture this is (for historical reasons, as we’ll see), a vector. For each word in the vocabulary (the vocabulary being the set of terms used in every document in the training examples, typically, sometimes you might try and smooth the morphology out or similar), you construct a numerical representation. Say the vocabulary is no-good, bad, rotten, and a document reads “no-good no-good bad”, you might describe it as a vector , showing two uses of the first vocabulary item, 1 of the second and none of the third. (Again, whether you count vocabulary items, or weight them in various ways, is a research question. You may also notice that this counting-of-occurences model is a “bag of words” approach, that is, it does not distinguish between “bad rotten” and “rotten bad” even though in language word order and syntactic structure is meaningful. It’s possible to transform the vectors so that this orthogonality of individual words does not hold.)

For reasons that I won’t go into here, I am trying to discuss this model briefly in my PhD thesis — actually, more briefly than I did above — and therefore looking to cite the originator of the idea. I started coming across citations in other papers that looked something like: “Gerard Salton [and others] (1975). A vector space model for information retrieval.” Sounds good. It’s got the key words in it, and quite a few citations!

I like to sight before citing though, which means I found this interesting paper:

David Dubin (2004). The Most Influential Paper Gerard Salton Never Wrote, Library Trends 52(4):748–764.

Gerard Salton is often credited with developing the vector space model (VSM) for information retrieval (IR). Citations to Salton give the impression that the VSM must have been articulated as an IR model sometime between 1970 and 1975. However, the VSM as it is understood today evolved over a longer time period than is usually acknowledged, and an articulation of the model and its assumptions did not appear in print until several years after those assumptions had been criticized and alternative models proposed. An often cited overview paper titled “A Vector Space Model for Information Retrieval” (alleged to have been published in 1975) does not exist, and citations to it represent a confusion of two 1975 articles, neither of which were overviews of the VSM as a model of information retrieval. Until the late 1970s, Salton did not present vector spaces as models of IR generally but rather as models of specific computations. Citations to the phantom paper reflect an apparently widely held misconception that the operational features and explanatory devices now associated with the VSM must have been introduced at the same time it was first proposed as an IR model.

Naturally such a subtle treatment of the history of the model is not great for my immediate purposes: I need That One Citation! (As best I can tell from Dubin, if I have to pick one it should be G. Salton, (1979). Mathematics and information retrieval. Journal of Documentation, 35(1), 1–29.) but it’s fun to come across the analysis of an idea in this form.

Update: if you want a reasonable overview of text classification/topic classification/topic assignment, the survey of choice seems to be Fabrizio Sebastiani (2002). Machine learning in automated text categorization, ACM Computing Surveys, 34(1):1–47. You know, modulo 11 years now.

January 8, 2012April 13, 2026

Mary’s helpful guide to soliciting research participation on the ‘net

This article originally appeared on Hoyden About Town.

In my years on the ‘net, I’ve seen any number of people want to interview others or get them to take surveys for everything from a short high school or undergraduate paper through to graduate research projects and books. And they so seldom manage to meet basic ethical guidelines for making sure they aren’t wasting their participants’ time at best or endangering them at worst. Hence this article.

In addition, this article may help research participants better assess requests: are researchers telling you what you need to know? Have they considered your interests as well as their desire to Find Something Out At All Costs?

Full disclosure: I am not a research ethics expert, I am simply a researcher helping you get the basics right. Please seek expert advice if you have any doubt about the safety or integrity of your research.

Why do I need to do this stuff?

Because you’re so often asking people sensitive stuff, that’s why!

Look, I have some sympathy for the “it’s just questions about something-seemingly-small!” myself. I ask people questions about their linguistic intuitions. “Which sentence reads better to you, A or B?” There’s nothing less fun than completing a 31 page ethics application to get approval to ask people about which sentences read better.

But look, all research, at best, takes up people’s time. You owe people something for that. In addition, quite a lot of the research people are recruiting for on the ‘net wants to get into harassment of women, political affiliations, sexual experiences, why people write slash. That kind of stuff? That kind of stuff in the wrong hands loses people jobs and relationships. You owe people serious, well thought out harm mitigation for that.

So, ethical research recruitment lets people know what they’re getting into, whether it is a boring half hour sharing linguistic intuitions, or sharing potentially damaging information with a reseracher.

The bare minimum

All researchers asking for participation should share this information:

Who are you?
Who do you work for or who commissioned this work, if not yourself?
How can I get in contact with you, and how can I get in contact with who you are working for?
What is the purpose of the research?
What is the status of the research? Is this sheer curiosity that made you whip up a survey in five minutes, or a pilot study, or the main game?
What kind of effort do you want from me? (Interviews versus surveys. Five minutes versus many hours. You get the idea. Tell me upfront what my time investment is.)
When you’re done, where can I see the results?
Will the results be made public and in what form? (A peer-reviewed article? A PhD thesis? A pop science book? On your blog?)

Some of this might be the sort of thing you want to put on a webpage you can link to, so you can leave short advertisements like “Hi, I’m looking for help with X, and thought readers here might want to help because of Y, if you need to know more, please see LINK.”

You;d be amazed how many people miss the “When you’re done, where can I see the results?” step. Even if they’re asking people for 20 hours of interviews or something like that. For anything but the most trivial investment of time, letting people read your results is the minimum reward required.

Also, results being made public can often be good: the subject’s work is contributing to the sum of human knowledge! So don’t consider this necessarily a bad thing in and of itself.

Institutional research

If you are doing research at the postgraduate, postdoctoral or faculty level, research using human subjects (and other animal subjects for that matter, but you aren’t likely to be recruiting them on blogs) requires ethics approval by an institution-level ethics committee in most institutions.

So, when soliciting participants for research that has ethics approval, provide the following info:

All the bare minimums plus
A statement citing your ethics approval in whatever manner is usual. Your committee probably has boilerplate. Typically this will name the institution, give a reference number for your experiment and provide contact details for the ethics committee.
If your ethics committee approved a recruitment advertisement, use it! If it’s long put it at the other end of a link if that’s OK with them.
If your ethics approval requires that you disclose a bunch of things, also state them or place them at your info link if allowed.

If your institutional research didn’t require ethics approval (some institutions might, for example, have a blanket policy covering low-risk things like linguistic intuition questionnaires) find whatever boilerplate they let you use instead, if there is any or say something sensible along the lines of “This questionnaire comes under the XYZ University Low Risk Experimentation Policy [link].”

Basically, if you are doing research on behalf of an employer state either that you have ethics approval, or if not, why not (eg, your institution has no committee).

No committee but doing something sensitive?

If you’re doing sensitive work outside the oversight of ethics committees, here’s the start of your checklist!

All the bare minimums plus
Are respondents going to be anonymised in your personal/researcher copy of the data? Are you stripping any associated names, IP addresses, email addresses and similar? If not, what are you keeping and why?
How are you storing the researcher copy of the data?
Who has access to the researcher copy of the data? (Yourself? Your boss? All of your boss’s present and future employees? The Internet?)
When do you plan to delete the researcher copy of the data, if ever?
Are respondents going to be anonymised in the published results? If not, what identifying information will you publish and why?
Can a respondent withdraw their participation and be deleted from your data or transcripts? How do they do it? How long do they have to do so?

There are all kinds of other factors that ethics committees would get you to look at, basically, what capacity for harm does your research have? How are you mitigating that harm? What risk to your participants is left?

Risks include: physical health risks; mental health risks (more common with online data gathering, eg, triggering questions); exposing people to relationship disruption or breakdown, or abuse (by, eg, asking them to discuss infidelity); exposing people to criminal prosecution (eg by asking them to discuss illegal drug use); exposing people to civil liability (eg by getting them to discuss breach of contract), exposing them to job loss; denying them the best treatment or resources (by, eg, giving preferential treatment to patients or students or employees who agree to take part in the research, thus harming others); and coercing participation in general. And there’s one question that frankly stands out to me as a member of the apparently rare species Lady on the ‘Net, which is “are you studying an over-studied population and if so, what benefit does this extra research have for them, as opposed to for you?”

One of the most obvious mitigation strategies is anonymity of your subjects in reports, and eventual data destruction of any private identifying data. But as you can see from the examples related to coerced participation, it isn’t the only strategy you might need. List your possible harms, list your mitigations, let the potential subjects decide if the research is worth it to them.

I wrote a similar post focussed on software development a few years back, in that case mainly focussed on “prove to your subjects that their participation is not a waste of their time.”

Mary’s helpful guide to soliciting research participation on the ‘net by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

November 21, 2011June 11, 2016

Computational linguists

xkcd suddenly exploded in my circles in 2006, thanks to the comic Randall Munroe calls Computational Linguists and most people refer to as “Fuck Computational Linguistics” getting around at the annual conference of the Association for Computational Linguistics.

There’s been requests for the xkcd store to sell it before, but it’s never been done.

I just ordered a batch through Sticker Mule, both of the full comic and of a smaller badge version I did. (They will do proofs of them, I’ll be interested to see if the “Fuck” bugs them.) In order to do so I did a vector version of the comic (via Inkscape’s “trace bitmap”), and because the original comic, and these variants, are under Creative Commons Attribution NonCommercial, I can share them with you here. If you want them, order copies from the sticker vendor of your choice!

Full comic:
Indicative PNG | Compressed Inkscape SVG | PDF (fonts as paths)

Smaller badge-like variant:

Compressed Inkscape SVG | PDF (fonts as paths)

The vector versions aren’t very clean, but neither is the original comic, so I’m hoping these look like the spirit of the original, rather than a nasty hack.

Reminder: these are licensed for free noncommercial use (the precise condition is noncommercial use with attribution to the original author, modifications OK). So don’t sell them!

October 7, 2011June 11, 2016

Ada Lovelace Day: Mahananda Dasgupta, nuclear fusion researcher

7th October is Ada Lovelace Day, a day to blog about your heroines in science, technology, engineering and math.

Mahananda Dasgupta is a professor in the Department of Nuclear Physics at the Australian National University. Dasgupta’s research takes place at the heavy-ion accelerator facility and investigates quantum tunnelling when heavy nuclei collide. Her Pawsey Medal award in 2006 cites cutting-edge contributions includ[ing] precision measurements of unprecedented accuracy.

Dasgupta moved to Australia from India for a postdoctoral position in the 1990s, and eventually was appointed to a tenured position in 2003. She became the first woman to hold a tenured position in the Research School of Physical Sciences and Engineering at the ANU in its entire 50+ years of existence! (I was very surprised to find this, the School must be enormous in terms of academic staff, it comprises nine research departments.)

How do we retain that female workforce [in science]?

By strong and meaningful mentoring, which doesn’t just mean a quick meeting once a month or web-based mentoring, but real mentors who encourage women or younger people to devise strategies about how best to use their time, and what roles to apply for to advance their career.

Every person at that early stage needs support. We need to champion women scientifically – not “she’s a good person”, but “she’s an excellent physicist who’s done this great work”… Equally, the employers’ responsibility to provide childcare is very important… If we are expanding and building infrastructure – why are we not building childcare facilities?

I was educated in India where, if a student is sharp, they’re encouraged to show it through participating in discussions or taking on extra-educational activities… It does strike me that in Australia we give a lot of kudos to those who excel in sports, but if you excel in studies you are a dork, particularly among other students… Sometimes, following talks I give in schools, students come to the carpark to ask me science questions, rather than asking them in front of the class… How do we get away from that? I believe that to make real long-term progress we must respect and encourage intellectual achievements.

Mahananda Dasgupta, The Conversation: So seriously, why aren’t there more women in science?

Dasgupta is active both in advocating careers in science in general, volunteering herself as a science careers lecturer at schools, and in speaking on behalf of women in science. In 2004 she was the Woman in Physics Lecturer for the year, and in 2011 she represented the Group of Eight universities (the eight universities that consider themselves Australia’s best research universities) at a Women in Science and Engineering summit at Parliament House. Her 2011 Georgina Sweet Australian Laureate Fellowship from the Australian Research Council calls upon her to increase the profile of Women in Science through outreach activities, and work towards advancing early career researchers as well as facilitate leadership pathways for senior women researchers.

Recognition Dasgupta has received for her work includes:

the Australian Academy of Sciences’ Pawsey Medal in 2006, for outstanding work in physics by a scientist under 40
her election as a Fellow of the Australian Academy of Science in 2011
an Australian Laureate Fellowship in 2011

I can’t embed them in the post for licencing reasons, but David Hine has a couple of photos of Dasgupta with her experimental equipment: Dr Mahananda Dasgupta and Dr Mahananda Dasgupta and Dr David Hinde.

References

ANU: Mahananda Dasgupta’s home page
WiseNet: ACT Women Honoured by the Academy
labonline.com.au: Physicist blazes new trail for women
Australian Research Council: Professor Mahananda Dasgupta: Recipient of 2011 Georgina Sweet Australian Laureate Fellowship

October 7, 2011June 11, 2016

Ada Lovelace Day: Fan Chung, leading mathematician

7th October is Ada Lovelace Day, a day to blog about your heroines in science, technology, engineering and math.
This is an expanded version of a post at Geek Feminism last year.

“Don’t be intimidated!… I have seen many people get discouraged because they see mathematics as full of deep incomprehensible theories. There is no reason to feel that way. In mathematics whatever you learn is yours and you build it up—one step at a time. It’s not like a real time game of winning and losing. You win if you are benefited from the power, rigor and beauty of mathematics. It is a big win if you discover a new principle or solve a tough problem.

Fan Chung

Fan Chung is a leading mathematician, specialising in combinatorics and later graph theory. She is Distinguished Professor of Mathematics and Computer Science at UC San Diego.

I first heard of Chung in Paul Hoffman’s The Man Who Loved Only Numbers: The Story of Paul Erdős and the Search for Mathematical Truth; Chung and her husband Ron Graham were two of Erdős’s closest collaborators. Hoffman tells a great story about how when Chung had finished, and come first in, her PhD qualifying exams at the University of Pennsylvania, her eventual PhD advisor Herbert Wilf gave her a textbook on Ramsey theory to browse and she came back and explained that she’d improved one of the proofs. That was a core part of her PhD dissertation, completed in a week. Those kinds of stories are told about the best mathematicians.

Chung has worked both in academia and in industry, having spent twenty years at Bell Labs and Bellcore in both information technology and mathematics before returning to the University of Pennsylvania, where she did her doctorate. After her time in industry she is deeply concerned with mathematical breadth, and is known for her “nose” for problems that cross several subfields.

Many mathematicians would hate to marry someone in the profession. They fear their relationship would be too competitive. In our case, not only are we both mathematicians, we both do work in the same areas. So we can understand and appreciate what the other is working on, and we can work on things together-and sometimes make good progress.

Fan Chung, describing her relationship with husband Ron Graham

If my count is right, Chung’s publication list shows 79 papers co-authored with Ron Graham. I’ve always admired stories of professionally companionate marrages: even Joan Didion and John Gregory Dunne can’t compete on those numbers.

Chung’s website has a copy of a chapter about her in Claudia Henrion’s Women in mathematics: the addition of difference. Among other things it talks about her move to the United States from Taiwan for her graduate work, and her thoughts on having a child while at graduate school.

[Graduate school] is a wonderful time to have a child. You don’t have to attend classes; you only have to write your thesis.

Fan Chung

Hrm, yes, well. Perhaps I will give that advice in 20 years time. Perhaps not…

References

UC San Diego: Fan Chung’s home page
Wikipedia: Fan Chung
Don Albers, Making Connections: A Profile of Fan Chung
J J O’Connor and E F Robertson, Fan Rong K Chung Graham

July 21, 2011August 22, 2016

Wednesday Geek Woman special edition: Sandra Magnus, STS-135, and the end of the shuttle program

This article originally appeared on Geek Feminism.

Back-to-back American astronauts, yes. Special occasion! This is by request, from deborah on July 7:

Sandra Magnus is flying on the last NASA space shuttle launch tomorrow– how about a quick hit about her? And about being sad about the space shuttle. 🙁

Space Shuttle Atlantis en route to launchpad. Image by NASA, public domain.

We’re a little late to the party, so I’m scheduling this entry for about twelve hours prior to the end of the mission: landing is scheduled at 21 July 2011 9:56 UTC.

Sandra Magnus has a PhD in materials science and engineering and has worked on stealth aircraft design. This is Magnus’s 4th Shuttle mission, but third trip into space: she spent 134 days in orbit between November 2008 and March 2009, travelling to the International Space Station on STS-126 and returning on STS-119.

Sandra Magnus exercises in the Destiny Module on the ISS, in zero gravity — Sandra Magnus exercises aboard the ISS, March 2009. Image by NASA, public domain.

STS-135 is the 33rd mission for Space Shuttle Atlantis, and the final mission of the Shuttle program. See NASA’s video of the launch. NASA TV will be showing coverage of STS-135 throughout the planned landing.

January 6, 2010April 20, 2026

Ethics of Free Software community research

Most of this entry is exactly a year old today and it’s just sat around in draft form all that time. Since I posted something similar on Geek Feminism about research into women in tech and similar topics, I thought I’d get it out there.

In January 2009 a researcher named Anne Chin of Monash University Law emailed the chat list for the linux.conf.au 2009 conference asking for research subjects to be interviewed about licencing and Open Source software. There were several responses criticising her use of HTML email and Microsoft Word attachments. I’ll leave the specifics of this alone except that people should be (and probably are) aware that this is almost always an unknowing violation of community norms.

I did, though, think about making some notes on research ethics and Free Software research. A bit about my background: I am not a specialist in ethics. I’m somewhat familiar with ethics applications to work with human subjects, but not from the perspective of evaluating them. I’ve made them, and I’ve been a subject in a study that had made them.

For people who haven’t seen this process, the ethical questions arising from using human subjects in your research in general covers the question of whether the good likely to arise from the outcomes of the study outweighs the harm done to the subjects, together with issues of consent to that harm. (There are many philosophical assumptions underlying this ethical framework, I don’t intend to treat them here.) Researchers in universities, hospitals, schools and research institutes usually have to present their experimental designs to an ethics committee who will determine this question for them and approve their experiment. Researchers who work across several of these (eg, a PhD student who wants to interview schoolchildren) will need to do several ethics applications, a notable chore when the forms and guidelines aren’t standardised and occasionally directly conflict. Researchers working for private commercial entities may or may not have a similar requirement. Researchers who use animals also have to have ethical reviews, these are done by animal ethics committees, which are usually separate.

At my university, essentially any part of your research that involves measuring or recording another person’s response to a research question and using it to help answer that question needs a human ethics application.

The good/harm balance may include very serious dilemmas: is there a health risk to subjects? how will the researcher manage the conflict between maintaining subject confidentiality and research integrity and the good of her subjects or the requirements of the law if she uncovers, say, episodes of abuse or violence? But it also involves less immediately obvious and serious ethical questions. Is this study a giant waste of subjects’ time? is considered a question of ethics by ethics committees, and is in fact the most serious problem for linguistics research, since there’s very seldom an outcome of particular interest to the subjects themselves.

The study in which I took part a few years back was towards the serious end actually: it was a study into the psychological profiles of people who have an immediate family member who had cancer as a child and involved both questionnaires and a phone interview with a psychologist. Both because the study explored memories of the illness and because the profiling included evaluating depressive episodes, suicidal ideation and so on, it came with a detailed consent form and with information about a counselling service that had been informed of the study and was prepared to work with its subjects.

In the case of the Free Software community the ethical questions are often more towards the waste of time? end of the spectrum than the more immediately serious end. It’s important to understand that this isn’t necessarily the case though. Here are some more cutting ethical problems:

getting findings that expose your subjects and/or their employers to intellectual property claims; or
revealing that your subjects are breaching employment contracts in some way (generally also related to IP) and thus exposing them to job loss and possible civil action.

Getting ethics approval to carry out workplace studies can be fairly hard precisely due to problems like these. But in the rest of this post I will treat the waste of time problem.

Firstly the basics: are your subjects going to be identifiable in your final reports or to the general public? If not, who will know who they are? Can a subject opt to have their responses removed from the study? When and how? All this should be explained at the start. (Usually if an ethics committee has been involved, there’s a consent form.) If doing a survey look into survey design, in order to construct non-leading questions and such.

Now, for specifics. Most of them arise from this principle: there are a lot of researchers working, in various ways, on the Free Software community, possibly making it a slightly over-studied group if anything. This places the onus on the individual researcher to demonstrate to the community that their project is worthwhile and that they’re going to do what they say. Thus:

demonstrate some familiarity with the background. Depending on your research level this could mean anything from demonstrating a knowledge of existing anthropological work on Free Software (say, if the research project is for your anthropology PhD) down to at least understanding the essential concepts and core history (say, a project at high school level). This can be demonstrated by research design, eg asking sensible well-informed questions, but actually mostly requires a bigger time investment: making appearances in the community, either virtually or physically, ideally for a little time before asking the community to help you get your PhD/A-grade/pass.
don’t get the community to design your experiment for you. Have a specific goal, more specific than get people to write me lengthy essays about Free Software, and get ideas from that and write about them. In the general case, the ask people incredibly vague stuff and hope they say something interesting technique fails the waste-of-time test.
give your results back to the community. The most common problem with the various surveys, interviews and questionnaires sent to the Free Software community is that responding to them is like shouting into a black hole. It is not unheard of, of course, to see the thesis or essay or roundup that comes out of these, but it is unusual, relative to the number of requests. Most of the time the researcher promptly disappears. Researchers should come to the Free Software community with an explanation of when and where they will make the results of the study available. They should explain the aims in advance unless this would compromise the results. (On that note: Anne Chin is giving a linux.conf.au talk this year.)

Ethics of Free Software community research by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

March 27, 2009April 11, 2026

Ada Lovelace Day wrap 2: Karen Spärck Jones elsewhere

Yes, this does mean that a third of these things is coming, but I wanted to point to some other profiles of Karen Spärck Jones, aside from my brief one. At least at the present time, she’s on the first page of most profiled Ada Lovelace Day subjects. I was really pleased to learn more about this inspiring scientist.

Martin Belam has a long profile quoting extensively from Spärck Jones’s interviews and speeches and focussing on both her own career progression: she worked with Margaret Masterman at the Cambridge Language Research Unit. “You have no conception of how narrow the career options were [for women],” is one of Belam’s quotes. Another one of her stories reminds me of more recent stories Pia Waugh has told me about the resistance of parents playing a role in girls not choosing computing careers (these days it’s apparently the perceived low earnings and limited career prospects of programmers from the point of view of ambitious parents, so at least something has changed):

We were trying to get at girls in schools [to take up computing] and we knew we had to get to the teachers first. We found that the spread of computing in the administrative and secretarial world has completely devalued it. When one of the teachers suggested to the parents of one girl that perhaps she should go into computing the parents said: ‘Oh we don’t want Samantha just to be a secretary’. That’s nothing to do with nerdiness, but the fact that it’s such a routine thing.

Bill Thompson was a student of Spärck Jones’s, and writes about her influence on him as a fellow philosopher turned computer scientist. He also wrote her obituary for The Times (and, in 2003, that of her husband, fellow computer scientist Roger Needham).

IT journalist Brian Runciman remembers Spärck Jones as the most interesting woman he’s ever interviewed in Computing’s too important to be left to men. (I think it’s very important to get more women into computing. My slogan is: Computing is too important to be left to men. seems to be Spärck Jones’s best known quote.) In the interview with him, she talked about how her ideas permeate modern search engine implementations.

She scored smaller mentions from:

Tom Simonite in New Scientist: Celebrating Ada Lovelace: the ‘world’s first programmer’
Rose Tinted Web: Ada Lovelace Day
Peter Turney (himself a well known computational linguist) lists her among others at Ada Lovelace Day
Mariya Genzel on Twitter

March 24, 2009April 11, 2026

Ada Lovelace Day profile: Karen Spärck Jones

Let’s create new role models and make sure that whenever the question “Who are the leading women in tech?” is asked, that we all have a list of candidates on the tips of our tongues… To take part All you need to do is… pick your tech heroine and then publish your blog post any time on Tuesday 24th March 2009. It doesn’t matter how new or old your blog is, what gender you are, what language you blog in, or what you normally blog about – everyone is invited.

This is a profile of a woman in technology for Ada Lovelace Day.

Karen Spärck Jones by Markus Kuhn (modifications by Mary Gardiner) is licensed under a Creative Commons Attribution 2.5 Australia License.
Based on a work at commons.wikimedia.org.

I first heard about Karen Spärck Jones, who was a senior scientist in my field of computational linguistics, in 2007 as part of my paying job, which is as the editorial assistant for Computational Linguistics. Just before she died, Spärck Jones wrote Computational Linguistics: What About the Linguistics? which we published posthumously as the Last Words column for Vol. 33, No. 3. (Spärck Jones was aware both that she was dying and that her column was going to appear under the heading ‘Last Words’.) I was never able to correspond with her directly: she died before we even had the camera ready copies done.

Spärck Jones’s academic career began in 1957, and was funded entirely by grant money until 1994: most academics will recognise this as a hard way, requiring researchers to fund their own positions with grant money awarded in cycles.

Spärck Jones was the originator of the Inverse Document Frequency measure in information retrieval (1972, A statistical interpretation of term specificity and its application in retrieval., Journal of Documentation, 28:11–21) which is nearly ubiquitously used as part of the measure of the importance of various words contained in documents when searching for information. (The word ‘the’, for example, is very unimportant, as it occurs in essentially all documents, thus having high document frequency and low inverse document frequency.) She had a long history in experimental investigations of human language (most computational linguists are now in this business). She was also at one time president of the Association for Computation Linguistics.

Awards Spärck Jones won in her lifetime include Fellowships of the American and European Artificial Intelligence societies, Fellowship of the British Academy, the ACL Lifetime Achievement Award and the Lovelace Medal of the British Computer Society.

Elsewhere: Spärck Jones’s obituary in Computational Linguistics and Wikipedia.

Category: Science