Code release: Spam All the Links

The Geek Feminism blog’s Linkspam tradition started back in August 2009, in the very early days of the blog and by September it had occurred to us to take submissions through bookmarking services. From shortly after that point there were a sequence of scripts that pulled links out of RSS feeds. Last year, I began cleaning up my script and turning it into the one link-hoovering script to rule them all. It sucks links out of bookmarking sites, Twitter and WordPress sites and bundles them all up into an email that is sent to the linkspamming team there for curation, pre-formatted in HTML and with title and suggestion descriptions for each link. It even attempts to filter out links already posted in previous linkspams.

The Geek Feminism linkspammers aren’t the only link compilers in town, and it’s possible we’re not the only group who would find my script useful. I’ve therefore finished generalising it, and I’ve released it as Spam All the Links on Gitlab. It’s a Python 3 script that should run on most standard Python environments.

Spam All the Links

Spam All the Links is a command line script that fetches URL suggestions from
several sources and assembles them into one email. That email can in turn be
pasted into a blog entry or otherwise used to share the list of links.

Use case

Spam All the Links was written to assist in producing the Geek Feminism linkspam posts. It was developed to check WordPress comments, bookmarking websites such as Pinboard, and Twitter, for links tagged “geekfeminism”, assemble them into one email, and email them to an editor who could use the email as the basis for a blog post.

The script has been generalised to allow searches of RSS/Atom feeds, Twitter, and WordPress blog comments as specified by a configuration file.

Email output

The email output of the script has three components:

  1. a plain text email with the list of links
  2. a HTML email with the list of links
  3. an attachment with the HTML formatted links but no surrounding text so as to be easily copy and pasted

All three parts of the email can be templated with Jinja2.

Sources of links

Spam All the Links currently can be configured to check multiple sources of links, in these forms:

  1. RSS/Atom feeds, such as those produced by the bookmarking sites Pinboard or Diigo, where the link, title and description of the link can be derived from the equivalent fields in the RSS/Atom. (bookmarkfeed in the configuration file)
  2. RSS/Atom feeds where links can be found in the ‘body’ of a post (postfeed in the configuration file)
  3. Twitter searches (twitter in the configuration file)
  4. comments on WordPress blog entries (wpcommentsfeed in the configuration file)

More info, and the code, is available at the Spam All the Links repository at Gitlab. It is available under the MIT free software licence.

Quick hit: Battle of the Opens

A quick link for you: people interested in Free Software and similar intellectual property sharing and re-mixing movements would probably enjoy reading Dorothea Salo’s Battle of the Opens, talking about open source, open access (to research literature) and many other related movements. I think, in general, that Open Access and Free Culture and so on are far more aware of the software movement than the software movement is aware of any of the movements it partly inspired.

Here’s an excerpt about open access:

What is being made open? The academic literature: specifically, the peer-reviewed journal literature which is not written for royalties or any other direct monetary reward to its authors. (While open-access advocates happily cheer for open access to books and other research media, the different money-flows in these areas mean they are not a focus of the movement.) Open-access literature is in opposition to literature which is not available to be read unless a subscription, per-article, or other fee is paid by the reader or the reader’s proxy (e.g. a library).

What legal regimes are implicated? Copyright, again. Typical practice for the academic article is that its author(s) transfer their copyright in its entirety to the journal publisher, allowing the publisher to control reuse.

Ethics of Free Software community research

Most of this entry is exactly a year old today and it’s just sat around in draft form all that time. Since I posted something similar on Geek Feminism about research into women in tech and similar topics, I thought I’d get it out there.

In January 2009 a researcher named Anne Chin of Monash University Law emailed the chat list for the linux.conf.au 2009 conference asking for research subjects to be interviewed about licencing and Open Source software. There were several responses criticising her use of HTML email and Microsoft Word attachments. I’ll leave the specifics of this alone except that people should be (and probably are) aware that this is almost always an unknowing violation of community norms.

I did, though, think about making some notes on research ethics and Free Software research. A bit about my background: I am not a specialist in ethics. I’m somewhat familiar with ethics applications to work with human subjects, but not from the perspective of evaluating them. I’ve made them, and I’ve been a subject in a study that had made them.

For people who haven’t seen this process, the ethical questions arising from using human subjects in your research in general covers the question of whether the good likely to arise from the outcomes of the study outweighs the harm done to the subjects, together with issues of consent to that harm. (There are many philosophical assumptions underlying this ethical framework, I don’t intend to treat them here.) Researchers in universities, hospitals, schools and research institutes usually have to present their experimental designs to an ethics committee who will determine this question for them and approve their experiment. Researchers who work across several of these (eg, a PhD student who wants to interview schoolchildren) will need to do several ethics applications, a notable chore when the forms and guidelines aren’t standardised and occasionally directly conflict. Researchers working for private commercial entities may or may not have a similar requirement. Researchers who use animals also have to have ethical reviews, these are done by animal ethics committees, which are usually separate.

At my university, essentially any part of your research that involves measuring or recording another person’s response to a research question and using it to help answer that question needs a human ethics application.

The good/harm balance may include very serious dilemmas: is there a health risk to subjects? how will the researcher manage the conflict between maintaining subject confidentiality and research integrity and the good of her subjects or the requirements of the law if she uncovers, say, episodes of abuse or violence? But it also involves less immediately obvious and serious ethical questions. Is this study a giant waste of subjects’ time? is considered a question of ethics by ethics committees, and is in fact the most serious problem for linguistics research, since there’s very seldom an outcome of particular interest to the subjects themselves.

The study in which I took part a few years back was towards the serious end actually: it was a study into the psychological profiles of people who have an immediate family member who had cancer as a child and involved both questionnaires and a phone interview with a psychologist. Both because the study explored memories of the illness and because the profiling included evaluating depressive episodes, suicidal ideation and so on, it came with a detailed consent form and with information about a counselling service that had been informed of the study and was prepared to work with its subjects.

In the case of the Free Software community the ethical questions are often more towards the waste of time? end of the spectrum than the more immediately serious end. It’s important to understand that this isn’t necessarily the case though. Here are some more cutting ethical problems:

  • getting findings that expose your subjects and/or their employers to intellectual property claims; or
  • revealing that your subjects are breaching employment contracts in some way (generally also related to IP) and thus exposing them to job loss and possible civil action.

Getting ethics approval to carry out workplace studies can be fairly hard precisely due to problems like these. But in the rest of this post I will treat the waste of time problem.

Firstly the basics: are your subjects going to be identifiable in your final reports or to the general public? If not, who will know who they are? Can a subject opt to have their responses removed from the study? When and how? All this should be explained at the start. (Usually if an ethics committee has been involved, there’s a consent form.) If doing a survey look into survey design, in order to construct non-leading questions and such.

Now, for specifics. Most of them arise from this principle: there are a lot of researchers working, in various ways, on the Free Software community, possibly making it a slightly over-studied group if anything. This places the onus on the individual researcher to demonstrate to the community that their project is worthwhile and that they’re going to do what they say. Thus:

  1. demonstrate some familiarity with the background. Depending on your research level this could mean anything from demonstrating a knowledge of existing anthropological work on Free Software (say, if the research project is for your anthropology PhD) down to at least understanding the essential concepts and core history (say, a project at high school level). This can be demonstrated by research design, eg asking sensible well-informed questions, but actually mostly requires a bigger time investment: making appearances in the community, either virtually or physically, ideally for a little time before asking the community to help you get your PhD/A-grade/pass.
  2. don’t get the community to design your experiment for you. Have a specific goal, more specific than get people to write me lengthy essays about Free Software, and get ideas from that and write about them. In the general case, the ask people incredibly vague stuff and hope they say something interesting technique fails the waste-of-time test.
  3. give your results back to the community. The most common problem with the various surveys, interviews and questionnaires sent to the Free Software community is that responding to them is like shouting into a black hole. It is not unheard of, of course, to see the thesis or essay or roundup that comes out of these, but it is unusual, relative to the number of requests. Most of the time the researcher promptly disappears. Researchers should come to the Free Software community with an explanation of when and where they will make the results of the study available. They should explain the aims in advance unless this would compromise the results. (On that note: Anne Chin is giving a linux.conf.au talk this year.)

linux.conf.au 2010 miniconf idea

I’m not taking on additional tasks for linux.conf.au 2010 (above co-chairing the presentation selection committee with Michael Davies) and so won’t run with this, but an idea for someone with more time on their hands: a FOSSCoach miniconf or co-located event. I’m not sure whether you can call it FOSSCoach precisely (or would want to, I dislike the FOSS acronym), but the idea is an event teaching people how to work on Free Software projects. It could possibly be extended into, say, editing Wikipedia. If anyone wants to do this, go for it.

The linux.conf.au 2010 call for miniconfs is out now and is open until July 17. linux.conf.au 2010 will be held from 18–23 January 2010 in Wellington, New Zealand.

Planet Free Software

Article originally posted at the IT Kitchen, a now defunct project founded by Shelley Powers.

Free Software developers, who had strong mailing list and IRC based online communities before the advent of weblogging, have nevertheless found their way into it. This post is a summary of how the Free Software world is using blogs for collaboration; largely preferring aggregation of community members’ blogs over setting up single access group blogs, and using them as a community building tool rather than a software development collaboration tool.

One of the big developments was Advogato, which started in late 1999. The creator of Advogato, Raph Levien, appears to have been trying to start up a kind of a semi-formal guild system for Free Software developers, allowing them to rank each other as Master, Journeyer or Apprentice. As a small feature, he added the ability for users to make “diary entries”, the most recent of which were listed at the side of the front page.

While the other features of advogato proved only an intermittant success — the quality of the articles on the front page is widely lamented, and the certification system has been subject to a lot of debate and has not resulted in the development of formal mentoring — the diary feature was a smash hit. Waves of Free Software developers hit advogato in 2000 and 2001 as they started reading their co-developers’ diaries. The buzz even generated a Salon article in mid 2000.

The initial buzz surrounding Advogato occasionally caused users to publicly renounce their former bad opinions of “online journals”: rather than being ‘useless’ things full of stories about children and cats, they were a new space to talk about your code and find out more about your fellow developers. Advogato was known as a friendly place, in contrast sometimes with the development mailing lists themselves.

Eventually the worlds of Advogato and of blogs began to meet. In mid-2002 Levien was discovering the wider blogosphere and started exploring using his Advogato diary as a primary means of communication with other interesting people. By that time RSS feeds of individual entries and of the entire recent diary entries page were probably the single most requested feature: people no longer wanted to drop in on the site to skim through the new entries, they wanted to poll them like they were beginning to do with other websites. (RSS feeds of individuals’ diaries were added in April 2003.)

At around about this time also, some people started to express serious dissatisfaction with the Advogato community as political debates became more common and the community attracted a few diary trolls. Levien added a diary rating feature as requests to be able to keep some users off the recent entries page grew. Others used the Advogato article feature to deplore the decline in the community.

As various blogging tools became more popular around this time, it became increasingly common to see diary entries from an Advogato regular announcing that their diary was moving elsewhere.

As RSS feeds became fairly ubiquitous, the Free Software community started to revert to a more typical blogging community model: you read blogs of people whose names you knew, and you found other people you knew or knew of through sidebars and comments.

However, in mid-2003 Jeff Waugh of the GNOME desktop project decided to create his own version of the Advogato front page, a HTML page with recent blog entries from GNOME developers all over the web (including several on Advogato). He used the Spycroll aggregator software to pull in RSS feeds, and he made them all available on a single webpage, with the cute addition of disembodied "hackergotchi heads" personalising each name.

He was stunned with the popularity of the page he linked from his own sidebar as Planet GNOME and started to field all kinds of questions about it: the three most popular were “why isn’t this at planet.gnome.org?”, “why aren’t I on it?” and (to his surprise) “why isn’t there an RSS feed?”

The Planet idea took off rapidly over the next six months. Scott James Remnant was the next off the mark, creating Planet Debian. Remnant and Waugh forked spycroll soon after that to create the Planet aggregator script. In fairly short order, a lot of large Free Software projects needed to have their own planet: the Planet homepage now lists nearly 40 separate planets.

The planets have evolved a loose set of customs based on the ones in place at Planets GNOME and Debian. They do not require that syndicated blogs talk about Free Software or software development all the time: they encourage getting to know your fellow developers as people as well as techs. (John Fleck, a GNOME documentor who is not only a frequent poster, but is a frequent non-tech blogger, has been a kind of an acid test for this editorial policy: see the John Malkovich post and a later complaint.) The larger planets are starting to have to deal with line-ball calls about who should and should not be on the planet pages: Waugh apparently finds requiring that contributors use a real photo of themself somewhat helpful on Planet GNOME.

The planets have proved to be amazingly good at spreading blogging among Free Software communities. The two planets I host, LinuxChix Live and Planet Twisted are close to being my most popular hosted sites. They also fill an important gap in the usual Free Software communication tools: they don’t need to be as on-topic as mailing list posts, and they are more expressive than IRC. They’ve also had some influence on corporate group blogging: Richard Giles reported that the creation of Planet Sun was part of the explorations that led Sun employees to promote blogging internally, eventually leading to the creation of blogs.sun.com.

See also

Creative Commons License
Planet Free Software by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.