Code release: Spam All the Links

The Geek Feminism blog’s Linkspam tradition started back in August 2009, in the very early days of the blog and by September it had occurred to us to take submissions through bookmarking services. From shortly after that point there were a sequence of scripts that pulled links out of RSS feeds. Last year, I began cleaning up my script and turning it into the one link-hoovering script to rule them all. It sucks links out of bookmarking sites, Twitter and WordPress sites and bundles them all up into an email that is sent to the linkspamming team there for curation, pre-formatted in HTML and with title and suggestion descriptions for each link. It even attempts to filter out links already posted in previous linkspams.

The Geek Feminism linkspammers aren’t the only link compilers in town, and it’s possible we’re not the only group who would find my script useful. I’ve therefore finished generalising it, and I’ve released it as Spam All the Links on Gitlab. It’s a Python 3 script that should run on most standard Python environments.

Spam All the Links

Spam All the Links is a command line script that fetches URL suggestions from
several sources and assembles them into one email. That email can in turn be
pasted into a blog entry or otherwise used to share the list of links.

Use case

Spam All the Links was written to assist in producing the Geek Feminism linkspam posts. It was developed to check WordPress comments, bookmarking websites such as Pinboard, and Twitter, for links tagged “geekfeminism”, assemble them into one email, and email them to an editor who could use the email as the basis for a blog post.

The script has been generalised to allow searches of RSS/Atom feeds, Twitter, and WordPress blog comments as specified by a configuration file.

Email output

The email output of the script has three components:

  1. a plain text email with the list of links
  2. a HTML email with the list of links
  3. an attachment with the HTML formatted links but no surrounding text so as to be easily copy and pasted

All three parts of the email can be templated with Jinja2.

Sources of links

Spam All the Links currently can be configured to check multiple sources of links, in these forms:

  1. RSS/Atom feeds, such as those produced by the bookmarking sites Pinboard or Diigo, where the link, title and description of the link can be derived from the equivalent fields in the RSS/Atom. (bookmarkfeed in the configuration file)
  2. RSS/Atom feeds where links can be found in the ‘body’ of a post (postfeed in the configuration file)
  3. Twitter searches (twitter in the configuration file)
  4. comments on WordPress blog entries (wpcommentsfeed in the configuration file)

More info, and the code, is available at the Spam All the Links repository at Gitlab. It is available under the MIT free software licence.

Quick links: nothing to hide

This article originally appeared on Hoyden About Town.

Data retention is coming to Australia very soon.

[Data retained] includes your name, address and other identifying information, your contract details, billing and payment information. In relation to each communication, it includes the date, start and finish times, and the identities of the other parties to the communication. And it includes the location data, such as the mobile cell towers or Wi-Fi hotspots you were accessing at the time…

But surely they’ve included special protections for communications between doctors and patients, and lawyers and clients? No. Never even discussed…

The Joint Committee recommended that the Act be amended to ensure that the metadata can’t be obtained by parties in civil litigation cases (I’ve mentioned before how excited litigation lawyers will be about all this lovely new data), and George Brandis said that would be fixed in the final amendments. But it isn’t there. The final Bill being bulldozed through Parliament right now contains no such protection. The fact remains that, under the Telecommunications Act, one of the situations in which a service provider cannot resist handing over stored data is when a court has required it by issuing a subpoena. In practice, that means that your ex-spouse, former business partners, suspicious insurance company or employer can get hold of a complete digital history of your movements and communications for the past two years, and use it against you in court.

Michael Bradley, Our privacy is about to be serially infringed, The Drum, March 19 2015

Surveillance cameras attached to a building exterior
Surveillance, by Jonathan McIntosh@Flickr CC BY-SA

Noted elsewhere: all this data will be stored by various companies with varying degrees of security awareness, so in practice it will sometimes be available to some criminals too.

Elsewhere:


Image credit: Surveillance by Jonathan McIntosh, Creative Commons Atttribution-Sharealike

Importing a large blog to WordPress.com: WXR splitting tools

I am about to import a very large WordPress blog (not this one) to WordPress.com.

There’s two issues:

1. The WXR (WordPress eXtended RSS) export from the site is 105MB uncompressed and 22MB compressed (with gzip -9). This is too large to upload to WordPress.com, which only accepts uploads of 15MB at most.

2. This site has 4000 media file uploads (and 6000 posts). The original host is going away: those 4000 media files (mostly images) must also be imported into WordPress.com.

The obvious solution to #1 is to split the upload into multiple files, but I have just tested on WordPress.com, and in order to get it to change the post contents to refer to the imported copy of the media files, rather than the original externally hosted copy which is about to go away, the media file and the post must be uploaded in the same XML file. The scripts that I’ve found that will split WXR files into multiple XML files do not attempt to put media files and the posts that refer to them in the same XML file (eg mainSplit.py doesn’t do this), they just split the contents of the export file up in the order they appear.

Anyone got leads on this one?

The right to forget, or, that one terrible road stop

I predict that soon the conversation will turn from the right to be forgotten to the right to forget.

Why so? Well, now Google Maps now tries to remember places I’ve been and include them in the maps it shows me. The trouble with this (ignoring any petty privacy, commercialisation, misc concerns you may be about to mention to me) is that there are some places that should be forgotten. In particular, all of Western Sydney’s commerce is now represented to me by one service station that we stopped at on a family trip because someone needed to use the loo, but couldn’t, because its loo was splattered with largely unspecified bodily fluids.

Get it together Google! This is even worse than the way my Youtube suggestions are now and forever filled with Thomas the Tank Engine videos because of an unfortunate and lengthy phase my son went through. I insist on not navigating Sydney in future primarily in terms of which horrible public toilet I am nearest.

 

It’s password management turtles all the way down

Since I mentioned password management in passing yesterday I recall a question I haven’t seen answered yet: how do you manage your password management passwords?

My setup is this: as advocated by, eg Bruce Schneier and Troy Hunt (but not, apparently, by Florêncio et al 2014, although I’ve only read the abstract and some of the press) I use a password manager, which stores huge long random passwords for all the sites I use and is in turn password protected.

While I’ve been doing this for several years, a few flaws have emerged:

  1. Google passwords. You have no idea how often you need to enter a Google password on an Android phone until… you do. And you’ll be reminded for every new device and then every password change, even if you’re a Heartbleed-level-or-greater password changer. It’s very very difficult to survive setting your Google password to F]U8NScS+RP7eL5)v=gj7f*/bX~$&` or even F]U8NScS+R frankly as an Android user. (Especially since if you have two factor turned on, the way you authenticate to an Android phone involves entering your password twice.)
  2. shared passwords, often required in business in particular but also in (cough) personal households, and not handled by most password managers in a model other “a password database for you” and “a password database for you and your boss” and so on for potentially combinatorial values of “you and [colleague]”

There are some services that attempt to solve that second point within an organisation, eg, Lastpass Enterprise but even allowing for that, let us enumerate the password manager passwords that a hypothetical individual called Mary currently has:

  1. personal password manager password
  2. work password manager password
  3. household password manager password
  4. volunteer organisation password manager password

And at the point where this hypothetical individual is remembering four separate extremely complex and secure passwords it’s beginning to look like the promised land of “the last password you’ll ever need” is, well, turtles all the way down.

It’s 2014 and the Internet is still atomising my household

Here’s some electronic things my household owns collectively:

  • our main camera
  • our television
  • our games consoles
  • our Kindle and Nexus tablet

Here’s the services I use almost daily that do not have any notion of collectively owned content or multiple publishers wanting to manage a single account:

  • Flickr
  • Google Play, or any other Google service
  • Xbox Live (to the extent I’ve explored it)

And this is epically frustrating, because here’s some use cases that these websites don’t handle well.

  • we share parenting of our children. We would like to be able to play one or both of them Frozen or Cars or whatever without both owning a copy from a streamable service or someone needing to leave a logged in Android device with a known password in the house at all times.
  • we both take photographs on our main camera. We sometimes can’t remember who took which one and in any case, it’s always me who post-processes them. We would like to be able to publish them on a photo sharing website and maybe sometimes attribute authorship (if one of us is especially proud of a shot and actually remembers taking it) and sometimes not!
  • we read the same books because I read them first and Andrew reads some subset of them on my recommendation, and we’d like to do that without both buying a copy.
  • we listen to the same music because Andrew listens to it first and I listen to some subset of it on his recommendation, and we’d like to do that without both buying a copy.

I mean, it’s disgusting really. One day we could even do the ultimate in simple gross violation of normal and healthy relationship boundaries some day and want to play each other’s saved games.

Right now we do pretty much what everyone does to some degree, as far as I can tell, which is to have a shared Amazon account and a shared Flickr account and still buy movies on optical discs for now even though five minutes of unskippable sections at the start are annoying and put our music on a fileserver and awkwardly manage our photos on a USB hard drive that can get plugged into different laptops and really not stream much stuff at all. Maybe one day we’ll have some kind of dedicated device that is logged into someone’s Google account and streams movies that are always bought through that account, or something like that.

Now traditionally when I make this point, someone will show up and say “yes, my dear, but something extremely complicated is going on here, much too complex and subtle for your delicate sensibilities, called making money through an advertising revenue model requiring demographic information and the entire world will go bankrupt if we allowed multiple people to share accounts even for content they produced in any recognised way, so don’t worry your pretty little head about it and let your husband buy the clicky button things from now on.”

To which I answer: this blog is (to the best of my knowledge) not owned by any of Yahoo!, Google or Microsoft and does not especially care about their revenue models. Moreover, if your comment boils down to “please try and see this from the side of the websites” I will replace your comment with the one from the previous paragraph, sexist content and all. (Also don’t explain to me that one can share passwords in various ways. I know. I do those things.)

I will concede one point: households don’t have continuity in the way that individuals do. My household will split into at least three and perhaps four someday. This is pretty much impossible to model in the present intellectual property+licencing rights model as far as I can tell.

And all the same, I’m annoyed that the software world is really hostile to the (very normal) way I live my life and is (surprise!) set up for a world in which each of the four people in my house sits in their own room with their own TV + gaming system + speakers + phone/tablet + ereader interacting with content they purchased entirely separately, and in many cases, in duplicate (possibly) maximising your revenue since whichever unfortunate day someone came up with the idea of an “account” on a computer system.

First ecosystem to fix this gets to sell me Frozen or something.

Opt-in Creative Commons licencing plugin for WordPress?

Does anyone have a recommendation for an opt-in Creative Commons licencing plugin for WordPress. That is, one where the default state is not to CC licence something, but when some action is taken, an individual post or page can be so licenced.

As background: I have no desire to write, maintain, or even debug a WordPress plugin. I want to know if there is something for this use case that Just Works.

I want opt-in, because it is too hard to remember, or to train others, to find an opt-out box when posting, and thus end up CC licensing things that weren’t intended to be, or can’t be, released under such a licence.

Some options I’ve already looked into:

WP License reloaded: was pretty much exactly what I wanted but doesn’t seem to be actively maintained and is now failing (possibly because the site in question is now hosted on SSL, I’m not sure, see above about not being interested in debugging).

Creative Commons Configurator: seems to be the most actively maintained CC plugin, but seems to be opt-out, and even that was only introduced recently.

Creative Commons Generator: opt-out.

Easy CC License: perhaps what I want, although I’d rather do this with an options dialogue of some kind than a shortcode.

Your crontab file should start with “crontab -l”!

I’ve never personally had this problem, but a number of people have told me that they’ve, often repeatedly, accidentally deleted their crontab by typing crontab -r (which silently removes a crontab) rather than crontab -l (which shows you what is in it) or crontab -e (which lets you edit it). It doesn’t help that “e” and “r” are next to each other on QWERTY keyboards.

Create a single backup of your crontab contents

Since I realised this was an issue, I’ve made the first line in my crontabs the following:

@daily crontab -l > ~/crontab.backup

If you ever accidentally use crontab -r, you can use crontab ~/crontab.backup to reinstall your crontab!

Adjust @daily to a time at which your computer is likely to be on, if it’s not always on, eg 0 10 * * * for 10am daily.

For bonus points, writing this entry reminded me that I hadn’t reinstalled my laptop’s crontab on my new machine, and meant it was easy for me to find and install!

Create timestamped backups of your crontab contents

The above is simple and suffices for me, but if you don’t have a backup routine that will grab ~/crontab.backup regularly enough for your needs, you could do something like this instead:

@daily mkdir -p ~/crontab-backups; crontab -l > ~/crontab-backups/crontab-`date +%Y%m%d-%H%M%S`; find ~/crontab-backups -type f -ctime +7 -delete

Explanation:

  1. mkdir -p ~/crontab-backups makes a directory crontab-backups in your home directory if it doesn’t already exist (and doesn’t complain if it does exist).
  2. crontab -l > ~/crontab-backups/crontab-`date +%Y%m%d-%H%M%S` puts your current crontab into a file named with a datestamp (eg crontab-20140711-124450 so that you can easily have more than one
  3. find ~/crontab-backups -type f -ctime +7 -delete finds all files (-type f) in ~/crontab-backups that were created more than 7 days ago (-ctime +7) and deletes them (-delete)

Warning: you don’t want to put anything else in ~/crontab-backups, because it too will be deleted after seven days.

Use python-flickrapi 1.2 even after the Flickr SSL transition

On June 27 2014, Flickr changed their API to be SSL-only. The Python flickrapi library was one of many pieces of software that used HTTP to connect to Flickr’s API, and that therefore broke for some users on June 27.

flickrapi supports HTTPS connections as of version 1.4.4, released on June 18 2014. If you are able to upgrade to a new version of flickrapi, you can get the latest flickrapi version from PyPI and ignore the rest of this post.

However, as of mid-2014, many Linux distros, including Ubuntu 14.04 (supported until 2019), still package flickrapi version 1.2, which cannot connect to Flickr’s API over HTTPS and is therefore now non-functional. Since developers may for various reasons choose to use their distro’s version of python-flickrapi, I’ve written a very very small Python class that overrides flickrapi’s FlickrAPI class to connect to Flickr over HTTPS rather than HTTP, and allows continued use of the Flickr API.

You can download my Python module that allows this: flickrapissl. See the README for usage.

Robot cars: why I’m both excited and worried

Maybe this is selection bias, but most people I know seem very underwhelmed by self-driving cars. I am whelmed! I hope it works out.

I should clarify, because I discovered when talking to my parents about this that the term “self-driving car” isn’t self-explanatory to everyone. To them, it meant something like “slightly better cruise control”, and seemed very unexciting. It may be clearer to say “robot car” or “robot taxi”. A self driving car is a car that does every driving task by itself. It decides on the route. It looks where it is going. It turns corners. It brakes. You could, at the end point of the development, lie on a bed asleep inside a car while it drives you where it’s going. It’s still not entirely clear (to me) that this is all feasible with likely technology, or that it will be legally acceptable, but it’s seeming more likely. After years of driving modified cars with added self-driving about, Google is making prototype cars for further development.

Incidentally, “robot car” is a lot easier to type than “self-driving cars”, so I’m going to adopt it for the rest of this entry.

Upsides of robot cars (why I’m excited)

Here’s what I envisage:

Less driving by humans (specifically, me). Most people who comment on cars love driving, which is why I suspect self-driving car discussions often spiral into “but driving is fun! no one will cede control over their favourite activity!” Well, I’m queuing up. It’s not that driving is never fun, but for me, but it’s frequently unfun, especially since it’s so often city driving. Most of the fun bits of driving could be replicated for me in dedicated arenas.

Time reclamation. I drove to Canberra on the Easter long weekend, got stuck in the worst traffic I’ve ever been in, and commented to my husband that my brain is somewhat over-powered to be spent deciding when to slightly and briefly depress an accelerator pedal. Less human time spent driving is more time spent reading, talking, composing, sleeping. Plus all kinds of hedonism.

Increased independence for ‘dependents’. Google’s publicity already talks about elderly people who have stopped driving. But I know other people who can’t drive. I live with two of them. They’re 4 months old, and 4 years old, respectively. The degree to which it is safe or healthy for children of those ages to be left to supervise themselves in robot cars is debatable, but by the time they’re say, 7? Sure, the car could drive them somewhere for fifteen minutes or half an hour. At 14 or 15? They can tell it where to go. (15 year olds can’t drive in NSW, but when I was 15, I was independently mobile on foot and on bike, and occasionally in planes, trains and taxis.)

Two other people I’ve personally known who can’t drive had a seizure disorder and limited vision respectively, and there are lots of medical and psychological reasons that limit or prohibit driving, many of which would be compatible with being sole passenger in a car.

Speaking of psychological reasons, while I believe I am a driver of roughly average ability now and I now don’t seem to find it more stressful than others do, I hated learning how to drive, was petrified, and was a frustrating and frightening student. If I had equal mobility without ever having gone through that, I would not have. (And I think neither would my husband, who didn’t learn to drive until I taught him.) I know quite a few people in this category, including some unable to use cars to this day.

More comfortable trips. The possibility of being driven around while lying down or in a comfortable chair or, for that matter, while standing or exercising or drinking with friends. (Although see ‘Humans as cargo’ below.)

Fewer cars. Cars that can come and pick you up should increase utilisation of cars, as in, there will be less cars total, and less empty cars at any given time. This diminishes the use of mined resources, and has a carbon impact (the manufacture of cars has a substantial carbon footprint).

Lighter cars. With a vastly improved safety profile (which I am taking as read, otherwise I think the whole project is null and void) the weight required for modern safety features in cars can be ditched.

Safer cars. This is an assumption, but really, they’re not going to launch at all if they aren’t significantly safer. So if it happens at all, deaths and injuries in road travel should fall to near zero if robot cars become ubiquitous.

Land reclamation. Fewer cars means being able to reclaim some of the very significant amount of private and public land use currently devoted to parking cars.

Potential significant fuel savings. Inefficient human driving presumably has some direct fuel cost. In addition, robot cars can spread their usage more evenly over different routes, further saving time and fuel. Diminished vehicle weight saves fuel.

Lifting of speed restrictions. Robot cars can’t overcome some physical difficulties here (non-linear increases in power consumption with speed, increased braking distances with speed) but they can overcome the human error that makes high speed driving dangerous. Trips should become somewhat faster with a high density of robot cars on the road.

Downsides of robot cars (or why I second-guess myself)

Here’s the tweet that inspired this entry:

Even ignoring the possibility of political decisions inhibiting my utopia above, I think there are significant potential downsides, some of which are likely, some of which are hard to model.

Job losses. At the driving end: taxi drivers, truck drivers, chauffeurs, public transport drivers. If less cars were manufactured, a huge number of jobs involved in manufacture, in supplying manufacturers, and in supplying and selling cars. It’s likely the cars will tend to have reduced wear and tear, so mechanics and their supply chains would be affected. If road wear-and-tear and fuel usage is reduced, people employed in road maintenance and in the massive fuel industry are affected. All raw materials and their supply chain are affected.

Which is all very well for me, I’m not in the transportation industry. But we as a society suck at dealing with the people who are made redundant by technological progress, and we’re not showing signs of getting better at it even as more and more work is threatened with redundancy.

Resources use and carbon output may rise. With the increased ability of people to use cars (by adding children and medically unfit drivers, and some unwilling drivers, to self-directed car users) comes the risk of additional aggregate kilometres travelled by car with negative consequences for carbon etc.

Shortened lifespan of existing cars. If there was a fairly fast transition at any point, human-operated cars then in existence become obsolete fairly rapidly, which is a resources and waste nightmare exceeding the flat-screen TV transition environmental disaster.

Longer commutes. While commuting might be less miserable per minute, it’s entirely possible that commensurate increases in hours spent commuting might occur (or be demanded by the workplace), which for most people is a negative because of less time spent with family and friends, and a rather insidious negative at that (people apparently underestimate how bad it is, because it’s regular, something I originally learned about in Big house, big commute?, together with the idea of the “triangle of happiness”).

Unpredictable and likely bad disruption of public (cheap) transport. I don’t personally much buy into the intrinsic virtues of contact with strangers while travelling. (The “public” in “public transport”.) I don’t mind if opportunities to talk to strangers — or ignore them studiously in order to compensate for being in each other’s personal space — are diminished. I do mind access issues though. I would assume that the movement of currently unfit or unwilling drivers into robot cars would diminish use of public transport and therefore its availability. At the same time, I assume that individual robot car use will not be universally affordable (if primarily offered by for-profit entities as is likely). There’s no guarantee that, with presumed middle-class flight to robot cars, any longer distance transport will be available matching current public transport prices and coverage.

Unpredictable and likely bad availability of robot cars for less privileged users. If the safety profile is good enough, we might be able to eliminate child restraints in their current form, but there’s no guarantee cars will be available to fit wheel users, large people (tall, fat or both, although tall is more likely because it’s a male-associated trait), and large family groups. There’s no guarantee that non-literate people, or visually impaired people, or people who don’t speak the local majority language, will be able to operate the user interfaces. (I’d bet against, in fact.)

Privacy and autonomy. These are under substantial threat. Consider existing cases. Car sharing schemes track your location on GPS. Speed cameras record all passing vehicles, not just speeding ones. (Such data has been used as evidence in criminal trials in NSW.) Many transport smart card agencies hand over individual usage data (where you got on and off public transport) upon request to police, without a warrant. Robot cars have potential for all these abuses, plus the ability for other people to take over control of the car you’re in.

Known unknowns

There’s also some intensely unpredictable things:

Unpredictable disruption of places where people congregate. I do consider the existence of spaces serving as, essentially, town squares or village greens, a positive, and just as private cars significantly changed their location and availability, robot cars would too. I don’t know how and to what effect.

Humans become cargo. This is already more or less true in aviation, and it hasn’t played out so comfortably, with comfort tending to diminish in return for cheaper seats. But flying is fuel intensive (and the distances sometimes simply ridiculous). So it’s difficult to work out what Sydney to Canberra or Sydney to Melbourne would look like and it depends partly on power costs. Would a vehicle drive me to Canberra on its own? Would it go and hitch itself to some kind of road-train? Would a pod containing me be loaded onto a truck? Would I board a robot bus but otherwise have more or less the current bus experience? What would the effects on fuel use be of various models? (See I Spent 28 Hours on a Bus. I Loved It.) What would rich people do? What would poor people do?

Cargo becomes atomised. Conversely, when is the preference for large cargo loads (B-double trucks, trains full of coal) partly an issue of needing a human operator? Would those loads split into smaller units to be delivered on an as needed basis? When and at what point in the delivery process?

Increased drug use. Maybe? One inhibiting factor on drug use (the need to drive home) will be lifted. I’m not anti-drug use as a matter of principle, so this is not necessarily a negative to me.