Migrated away from Pinboard

I’ve migrated my online bookmarks away from Pinboard to LinkAce, which for my sins I am self-hosting at liber.puzzling.org. If you were following my bookmarks, here’s the updated links:

Various scripts and useful tidbits

  • Zapier supports LinkAce integrations, including for self-hosted instances, if you want to automatically add LinkAce bookmarks from any source
  • pinboard2linkace: LinkAce choked at importing all my Pinboard bookmarks (there were many thousands of them), so I wrote a Python script to migrate the remainder; for bonus points this script preserves the privacy settings of the links
  • LinkAce Privacy Lock: marks new LinkAce bookmarks private a number of days after creation
  • Pinboard Deleter: deletes all bookmarks stored in a Pinboard account. Ensure you backup / export your bookmarks before running this, Pinboard has a backup page, and the script itself will provide a JSON dump before it runs. An alternative to using this script is closing your account entirely.

Elsewhere

Delete your free Slack backlogs!

Why delete Slack backlogs?

Slack and other chat software tend to retain conversation history so that you can see and search what was said in the past. This can be very helpful for historical context and avoiding repeat conversations, but there’s all kinds of reasons why you don’t want to retain backlogs indefinitely:

  • people who join some time after the Slack is formed may find themselves being discussed in backlogs in terms that are uncomfortable now they can see it
  • the relationships of people in the Slack may change over time and previously friendly conversations may be weaponised by members
  • any malicious person who gains access to your Slack (whether by hacking or by being invited) gets the entire history of everything said there to bully or blackmail people with
  • the contents of the Slack might be subject to legal discovery at some point in the future and be used to win a lawsuit against members or owners of the Slack, or else simply dumped into the public record

Learn more in the Electronic Frontier Foundation’s Slack privacy campaign: What if All Your Slack Chats Were Leaked?, Slack should give free accounts control over retention.

How to delete Slack backlogs.

If you pay for Slack, you should use their message and file retention feature.

Update September 2024: as of 2022, Slack allows free workspaces to configure deletion after 90 days (otherwise it will hide the messages/files), and as of 2024 all data from free workspaces is deleted after a year and cannot be retained without paying. The below script is now mainly useful if you wish to retain data for less than 90 days in a free Slack workspace.

If you have a free Slack, you can do it yourself. If you are using the free plan, you can delete messages through the API. Here’s a really simple sample Python script any admin of your Slack can use, which will delete most messages posted more than 6 weeks ago. (Instructions.)

Alternatively, slack_cleaner2 is nicely flexible if you want to develop your own script. Or members could delete at least their own messages with eg the Message deleter for Slack 2.0 Chrome extension.

Script caveats

You will need owner or administrator access to your Slack instance (or else you cannot delete messages other users wrote).

The script operates with the credentials of the admin who runs it, and will not be able to delete other people’s messages in 1:1 DMs, or any messages in any private channel that admin is not in.

The script will not delete messages older than the 10,000 recent messages that free Slacks have access to (even deleting the newer messages doesn’t restore access to these). Yet these older messages are retained by the company and could be accessed if, eg, someone pays for the Slack in future or if a discovery motion is granted. Unfortunately, you will need to pay for Slack, at least briefly, to access these older messages for deletion.

Creative Commons License
Delete your free Slack backlogs! by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Code release: Spam All the Links

The Geek Feminism blog’s Linkspam tradition started back in August 2009, in the very early days of the blog and by September it had occurred to us to take submissions through bookmarking services. From shortly after that point there were a sequence of scripts that pulled links out of RSS feeds. Last year, I began cleaning up my script and turning it into the one link-hoovering script to rule them all. It sucks links out of bookmarking sites, Twitter and WordPress sites and bundles them all up into an email that is sent to the linkspamming team there for curation, pre-formatted in HTML and with title and suggestion descriptions for each link. It even attempts to filter out links already posted in previous linkspams.

The Geek Feminism linkspammers aren’t the only link compilers in town, and it’s possible we’re not the only group who would find my script useful. I’ve therefore finished generalising it, and I’ve released it as Spam All the Links on Gitlab. It’s a Python 3 script that should run on most standard Python environments.

Spam All the Links

Spam All the Links is a command line script that fetches URL suggestions from
several sources and assembles them into one email. That email can in turn be
pasted into a blog entry or otherwise used to share the list of links.

Use case

Spam All the Links was written to assist in producing the Geek Feminism linkspam posts. It was developed to check WordPress comments, bookmarking websites such as Pinboard, and Twitter, for links tagged “geekfeminism”, assemble them into one email, and email them to an editor who could use the email as the basis for a blog post.

The script has been generalised to allow searches of RSS/Atom feeds, Twitter, and WordPress blog comments as specified by a configuration file.

Email output

The email output of the script has three components:

  1. a plain text email with the list of links
  2. a HTML email with the list of links
  3. an attachment with the HTML formatted links but no surrounding text so as to be easily copy and pasted

All three parts of the email can be templated with Jinja2.

Sources of links

Spam All the Links currently can be configured to check multiple sources of links, in these forms:

  1. RSS/Atom feeds, such as those produced by the bookmarking sites Pinboard or Diigo, where the link, title and description of the link can be derived from the equivalent fields in the RSS/Atom. (bookmarkfeed in the configuration file)
  2. RSS/Atom feeds where links can be found in the ‘body’ of a post (postfeed in the configuration file)
  3. Twitter searches (twitter in the configuration file)
  4. comments on WordPress blog entries (wpcommentsfeed in the configuration file)

More info, and the code, is available at the Spam All the Links repository at Gitlab. It is available under the MIT free software licence.

Ethics of Free Software community research

Most of this entry is exactly a year old today and it’s just sat around in draft form all that time. Since I posted something similar on Geek Feminism about research into women in tech and similar topics, I thought I’d get it out there.

In January 2009 a researcher named Anne Chin of Monash University Law emailed the chat list for the linux.conf.au 2009 conference asking for research subjects to be interviewed about licencing and Open Source software. There were several responses criticising her use of HTML email and Microsoft Word attachments. I’ll leave the specifics of this alone except that people should be (and probably are) aware that this is almost always an unknowing violation of community norms.

I did, though, think about making some notes on research ethics and Free Software research. A bit about my background: I am not a specialist in ethics. I’m somewhat familiar with ethics applications to work with human subjects, but not from the perspective of evaluating them. I’ve made them, and I’ve been a subject in a study that had made them.

For people who haven’t seen this process, the ethical questions arising from using human subjects in your research in general covers the question of whether the good likely to arise from the outcomes of the study outweighs the harm done to the subjects, together with issues of consent to that harm. (There are many philosophical assumptions underlying this ethical framework, I don’t intend to treat them here.) Researchers in universities, hospitals, schools and research institutes usually have to present their experimental designs to an ethics committee who will determine this question for them and approve their experiment. Researchers who work across several of these (eg, a PhD student who wants to interview schoolchildren) will need to do several ethics applications, a notable chore when the forms and guidelines aren’t standardised and occasionally directly conflict. Researchers working for private commercial entities may or may not have a similar requirement. Researchers who use animals also have to have ethical reviews, these are done by animal ethics committees, which are usually separate.

At my university, essentially any part of your research that involves measuring or recording another person’s response to a research question and using it to help answer that question needs a human ethics application.

The good/harm balance may include very serious dilemmas: is there a health risk to subjects? how will the researcher manage the conflict between maintaining subject confidentiality and research integrity and the good of her subjects or the requirements of the law if she uncovers, say, episodes of abuse or violence? But it also involves less immediately obvious and serious ethical questions. Is this study a giant waste of subjects’ time? is considered a question of ethics by ethics committees, and is in fact the most serious problem for linguistics research, since there’s very seldom an outcome of particular interest to the subjects themselves.

The study in which I took part a few years back was towards the serious end actually: it was a study into the psychological profiles of people who have an immediate family member who had cancer as a child and involved both questionnaires and a phone interview with a psychologist. Both because the study explored memories of the illness and because the profiling included evaluating depressive episodes, suicidal ideation and so on, it came with a detailed consent form and with information about a counselling service that had been informed of the study and was prepared to work with its subjects.

In the case of the Free Software community the ethical questions are often more towards the waste of time? end of the spectrum than the more immediately serious end. It’s important to understand that this isn’t necessarily the case though. Here are some more cutting ethical problems:

  • getting findings that expose your subjects and/or their employers to intellectual property claims; or
  • revealing that your subjects are breaching employment contracts in some way (generally also related to IP) and thus exposing them to job loss and possible civil action.

Getting ethics approval to carry out workplace studies can be fairly hard precisely due to problems like these. But in the rest of this post I will treat the waste of time problem.

Firstly the basics: are your subjects going to be identifiable in your final reports or to the general public? If not, who will know who they are? Can a subject opt to have their responses removed from the study? When and how? All this should be explained at the start. (Usually if an ethics committee has been involved, there’s a consent form.) If doing a survey look into survey design, in order to construct non-leading questions and such.

Now, for specifics. Most of them arise from this principle: there are a lot of researchers working, in various ways, on the Free Software community, possibly making it a slightly over-studied group if anything. This places the onus on the individual researcher to demonstrate to the community that their project is worthwhile and that they’re going to do what they say. Thus:

  1. demonstrate some familiarity with the background. Depending on your research level this could mean anything from demonstrating a knowledge of existing anthropological work on Free Software (say, if the research project is for your anthropology PhD) down to at least understanding the essential concepts and core history (say, a project at high school level). This can be demonstrated by research design, eg asking sensible well-informed questions, but actually mostly requires a bigger time investment: making appearances in the community, either virtually or physically, ideally for a little time before asking the community to help you get your PhD/A-grade/pass.
  2. don’t get the community to design your experiment for you. Have a specific goal, more specific than get people to write me lengthy essays about Free Software, and get ideas from that and write about them. In the general case, the ask people incredibly vague stuff and hope they say something interesting technique fails the waste-of-time test.
  3. give your results back to the community. The most common problem with the various surveys, interviews and questionnaires sent to the Free Software community is that responding to them is like shouting into a black hole. It is not unheard of, of course, to see the thesis or essay or roundup that comes out of these, but it is unusual, relative to the number of requests. Most of the time the researcher promptly disappears. Researchers should come to the Free Software community with an explanation of when and where they will make the results of the study available. They should explain the aims in advance unless this would compromise the results. (On that note: Anne Chin is giving a linux.conf.au talk this year.)

Creative Commons License
Ethics of Free Software community research by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Clean up IMAP folders

Per Matt Palmer’s blog entry OfflineIMAP and Deleting Folders users of any mail sorting recipe that creates new mail folders a lot tend to find that over time they accumulate a lot of mail folders for, eg, email lists they are no longer subscribed to. And most IMAP clients will waste time checking those folders for new mail all the time.

Matt wrote:

Now, of course, someone’s going to point me to a small script that finds all of your local empty folders and deletes them locally then issues an IMAP “delete folder” command on the server. But I had fun working all this out, so it’s not a complete waste.

I haven’t quite done this, I’ve just written a script that detects and deletes empty remote folders. (For me, offlineimap does not have the behaviour of creating new remote folders, so I haven’t bothered cleaning up local folders.)

It’s good: it’s speeding up my mail syncs a whole lot, deleting these old folders I haven’t received mail in for about five years. I’ve got full details and the script available for download (as you’d expect, it’s short): Python script to delete empty IMAP folders.

Creative Commons License
Clean up IMAP folders by Mary Gardiner is licensed under a Creative Commons Attribution 4.0 International License.

Headless / commandline Bittorrent

I made noises yesterday that I might learn about BitTorrent. So I tried. (It’s an interesting protocol from the point of view of needing clients to enforce penalties against refusing to upload.) Here’s the paradigm I wanted it to fit into: at, the command scheduler. 95% of my readership will know (intimately) how residential broadband works in Australia, but for those who don’t it is typical to not have unlimited downloads. On a good, just above entry level, plan you might have a limit of about 4GB to 10GB a month. (The entry level plans tend to have a limit of about 512MB or 1GB for only about $10 to $15 less. This is like selling small amounts of an addictive substance. Someone will eventually tell you about Google Earth.) However, it is also very typical to have an off-peak period with a higher, perhaps even unstated, bandwidth cap. (Mine is 48GB in a month, in the midnight to noon period only.)

Since I run a headless server anyway (for mail and music), I vastly prefer to schedule my downloads for when I’m sleeping and bandwidth is cheaper. When I’m going to download something available on the web, I run at midnight and then give the command wget --limit-rate=something -q -c [url] (-c because I’ve usually tested about the first 100K of the download already). In the morning, assuming I’m not downloading an album from a very well known band, my file is there. I therefore wanted to do the same with Bittorrent.

I’ll cut to the chase here: the solution is Transmission‘s command-line tools, particularly transmission-daemon and transmission-remote. The daemon controls all the torrents, and, usefully for my limited download window, they can be stopped and started from the remote command and therefore from cron. The only catch is that these tools seem to have only very recently matured, as in, they don’t exist in Ubuntu 7.10/Gutsy. (It has a transmission-cli package, but the daemon isn’t in it.) The transmission-cli package from Hardy will not install on Gutsy either, but you can backport it without any hassles, assuming you know how to get hold of and build Ubuntu (Debian-style) source packages, which, granted, probably doesn’t apply to 95% of my readership (although it might apply to a majority once we count Planet Linux Australia).

I thought this was worth sharing though, after I spent hours mucking around with BitTorrent (the Python client, not the rebranded µTorrent) and BitTornado, neither of which, even in the headless versions, has much support for such selfish notions as not wanting to run it until such time as it’s good and done (you can send SIGTERM and they do resume cleanly on the next invocation, not exactly champion of the world on design though), and neither of which puts up with this old-fashioned nonsense of wanting to run a process without a controlling tty. BitTorrent doesn’t even have an inkling that you might want to limit download speeds to anything less than maximum (perhaps Bram Cohen has never lived in a house with anyone else who wanted to use his net connection). I ended up looking at Transmission because the GUI version (also apparently very nice) is now Ubuntu’s default BitTorrent client. (Clutch is allegedly a nice web interface for Transmission too, but I haven’t looked at it at all.)

Edit: this post is not a bleg. The last paragraph describes a problem, yes, but this post is about how I’ve solved that problem by discovering Transmission. Please, there’s no need to email me on getting BitTorrent, BitTornado or rTorrent to work in screen or similar. I’ve discovered Transmission, which doesn’t need screen and which is very cron-friendly. This post is intended as a positive review of Transmission, not a request for help. End edit.

Creative Commons License
Headless / commandline Bittorrent by Mary Gardiner is licensed under a Creative Commons Attribution 4.0 International License.

Planet Free Software

Article originally posted at the IT Kitchen, a now defunct project founded by Shelley Powers.

Free Software developers, who had strong mailing list and IRC based online communities before the advent of weblogging, have nevertheless found their way into it. This post is a summary of how the Free Software world is using blogs for collaboration; largely preferring aggregation of community members’ blogs over setting up single access group blogs, and using them as a community building tool rather than a software development collaboration tool.

One of the big developments was Advogato, which started in late 1999. The creator of Advogato, Raph Levien, appears to have been trying to start up a kind of a semi-formal guild system for Free Software developers, allowing them to rank each other as Master, Journeyer or Apprentice. As a small feature, he added the ability for users to make “diary entries”, the most recent of which were listed at the side of the front page.

While the other features of advogato proved only an intermittant success — the quality of the articles on the front page is widely lamented, and the certification system has been subject to a lot of debate and has not resulted in the development of formal mentoring — the diary feature was a smash hit. Waves of Free Software developers hit advogato in 2000 and 2001 as they started reading their co-developers’ diaries. The buzz even generated a Salon article in mid 2000.

The initial buzz surrounding Advogato occasionally caused users to publicly renounce their former bad opinions of “online journals”: rather than being ‘useless’ things full of stories about children and cats, they were a new space to talk about your code and find out more about your fellow developers. Advogato was known as a friendly place, in contrast sometimes with the development mailing lists themselves.

Eventually the worlds of Advogato and of blogs began to meet. In mid-2002 Levien was discovering the wider blogosphere and started exploring using his Advogato diary as a primary means of communication with other interesting people. By that time RSS feeds of individual entries and of the entire recent diary entries page were probably the single most requested feature: people no longer wanted to drop in on the site to skim through the new entries, they wanted to poll them like they were beginning to do with other websites. (RSS feeds of individuals’ diaries were added in April 2003.)

At around about this time also, some people started to express serious dissatisfaction with the Advogato community as political debates became more common and the community attracted a few diary trolls. Levien added a diary rating feature as requests to be able to keep some users off the recent entries page grew. Others used the Advogato article feature to deplore the decline in the community.

As various blogging tools became more popular around this time, it became increasingly common to see diary entries from an Advogato regular announcing that their diary was moving elsewhere.

As RSS feeds became fairly ubiquitous, the Free Software community started to revert to a more typical blogging community model: you read blogs of people whose names you knew, and you found other people you knew or knew of through sidebars and comments.

However, in mid-2003 Jeff Waugh of the GNOME desktop project decided to create his own version of the Advogato front page, a HTML page with recent blog entries from GNOME developers all over the web (including several on Advogato). He used the Spycroll aggregator software to pull in RSS feeds, and he made them all available on a single webpage, with the cute addition of disembodied "hackergotchi heads" personalising each name.

He was stunned with the popularity of the page he linked from his own sidebar as Planet GNOME and started to field all kinds of questions about it: the three most popular were “why isn’t this at planet.gnome.org?”, “why aren’t I on it?” and (to his surprise) “why isn’t there an RSS feed?”

The Planet idea took off rapidly over the next six months. Scott James Remnant was the next off the mark, creating Planet Debian. Remnant and Waugh forked spycroll soon after that to create the Planet aggregator script. In fairly short order, a lot of large Free Software projects needed to have their own planet: the Planet homepage now lists nearly 40 separate planets.

The planets have evolved a loose set of customs based on the ones in place at Planets GNOME and Debian. They do not require that syndicated blogs talk about Free Software or software development all the time: they encourage getting to know your fellow developers as people as well as techs. (John Fleck, a GNOME documentor who is not only a frequent poster, but is a frequent non-tech blogger, has been a kind of an acid test for this editorial policy: see the John Malkovich post and a later complaint.) The larger planets are starting to have to deal with line-ball calls about who should and should not be on the planet pages: Waugh apparently finds requiring that contributors use a real photo of themself somewhat helpful on Planet GNOME.

The planets have proved to be amazingly good at spreading blogging among Free Software communities. The two planets I host, LinuxChix Live and Planet Twisted are close to being my most popular hosted sites. They also fill an important gap in the usual Free Software communication tools: they don’t need to be as on-topic as mailing list posts, and they are more expressive than IRC. They’ve also had some influence on corporate group blogging: Richard Giles reported that the creation of Planet Sun was part of the explorations that led Sun employees to promote blogging internally, eventually leading to the creation of blogs.sun.com.

See also

Creative Commons License
Planet Free Software by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.