Don’t trust Livejournal? Delete your account.

In December 2016, LiveJournal moved their servers to Russia and in April 2017 they updated their terms of use in a way many users find unacceptable.

In January 2017, as people considered the implications of the server move to Russia, I saw a number of people hesitant to delete their accounts as they were hoping to overwrite their data on Livejournal before deleting, by, eg, replacing their entries with Shakespeare’s plays, or with random nonsense, so that Livejournal didn’t have the entry any more. This won’t work and you might as well just delete your Livejournal account.

Here’s a loose analogy for the way that data on a site like Livejournal may be stored:

There’s a journalling website. It stores its entries on vast reams of paper in a giant library and new entries are scribed onto paper and filed.

The “overwrite with nonsense” strategy assumes that any journal entry you make is at a fixed location on a fixed bit of paper for all time. When you update the entry, the scribe goes to the existing bits of paper and writes on top of them. While this is technically possible with hard drives and similar, in a way that it isn’t with literal paper, here’s what more likely actually happens:

You update the entry, replacing it a Shakespearean play. The new version is written on entirely random empty paper (maybe blank, maybe where someone else’s deleted entry once was), and an index in a different part of the library is also updated. It used to say that your entry of January 7 was on floor 6, shelf 216, and now it says that your entry of January 7 was on floor 12, shelf 16.

But the contents of floor 6, shelf 216 are likely not overwritten for some time. Perhaps they’re marked as available to be overwritten, to be reused whenever it seems sensible, but you won’t know when that is. On the other hand, perhaps they are deliberately marked in the index as “an old version of the January 7 entry” for the sake of users having an edit history, or to have an audit trail, or because a lawsuit demands it, or because a government demands it. This may or may not be visible to you.

Even if floor 6, shelf 216 is marked available to be overwritten, it may not be actively erased, and if it isn’t actively erased, it’s available to be searched by a sufficiently determined or empowered person. (And searching unindexed digital storage is a lot faster and cheaper than searching paper, so not one thousandth as determined or empowered as you need to be to search a library full of unindexed paper.)

And even if floor 6, shelf 216 is no longer marked as “an old version of the entry of January 7”, on any moderately well-run website, floor 6, shelf 216 was never the only copy of your entry anyway. What if there was an accident with fire or water or whiteout? There are backups of your entry, probably at least two in the same library and at least one in a different library. These backups are usually moments in time, ie, the state of the entire journalling website as of New Years. The state of the entire journalling website as of New Years the previous year.

These backups are almost certainly never wiped of entries that are simply edited, and without adding a system that searches back through backups and selectively deletes backups of deleted accounts, they most likely contain the complete contents of deleted accounts as well.

So what you’ve ended up with is a situation where floor 12, shelf 16 contains a Shakespearean play, floor 6, shelf 216 likely contains your original entry, and there are several backups around that almost certainly contain your original entry and are designed in such a way as to be found and restored relatively quickly. This is not a much more secure situation than before you replaced the entry with a Shakespearean play; probably not worth the work you did.

All that said, it’s important to know that there are trade-offs in adding secure, permanent deletion. People quite often edit or delete their data accidentally, or temporarily — for example it is quite common to disable social media accounts temporarily to enforce a social media break — and it’s also common to be hacked and have your data deleted by the hacker. Enthusiastic data scubbing will actively harm you in all these cases. On top of that, storage systems fail (in my analogy, the library burns down, except hard drives fail more often than paper does), and backups are especially important then. And any system that goes back in time and edit backups has risks; what if it has a bug (all software has bugs) and deletes things it shouldn’t? System design to balance securely deleting data that users want to permanently delete with rarely or never deleting data they expect to keep is not easy.

So Livejournal or another site has your personal data, what should you do? I suggest that when you no longer use an online service, or you no longer trust in its management, that you take a personal backup of the data if possible and if you want it, and then delete your account.

You cannot usefully take any additional steps like overwriting your account with nonsense to ensure that actual data scrubbing took place and you should assume that it wasn’t scrubbed unless you can find some written guarantee otherwise. However, over time, backups will get selectively pruned, outages will happen, the business may eventually fail and your data will most likely become slowly less available and complete. That’s the best you can do.

For online services you actively use and where you do trust the management enough to keep your account, ask for written descriptions of their data scrubbing practices to be developed for deleted data and deleted accounts, including deletion from backups and handling of disused hard drives.

Elsewhere:

Tim Chevalier, PSA: Switching to Dreamwidth? (January 2017).

Disclosure: I am an employee of Google. This post does not describe Google’s data deletion practices, in which I’m not an expert in any case; it’s a general description of easy, sometimes harmful, defaults that systems designers could fall into. For Google-specific information, you can view privacy.google.com and Google Infrastructure Security Design Overview.

Creative Commons License
Don’t trust Livejournal? Delete your account. by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Flickr features I’ve known and loved

As the planning for the sale of Yahoo!/Altaba to Verizon continues, I’m not the only person worried about the fate of Flickr, which has been owned by Yahoo since 2005:

I’ve got a tediously backed-up local copy of my photos and won’t have to kiss them goodbye, but as a happy Pro user of Flickr I’m really worried about its future and beginning an active search for replacements. I’m going to start evaluating possible replacements on the basis of these specific features, roughly in order of importance:

My favourite Flickr features

Embedding my foremost use of Flickr is as a photo host for my parenting blog and, increasingly, to show off my best photos. The ability to embed photographs in third-party websites is essential to me.

Locking at the photo level and guest access. It’s not easy to find non-recent photographs of my children on my Flickr account. That’s because I have a script that marks photos as private once they’re a certain age. Some other types of photos (for example, photos of other children) I often mark as private immediately.

Much of my web life runs this way: just because you can find my recent stuff doesn’t mean you get to casually browse everything I’ve done on the Internet since the beginning of time (circa 1999). I’ve taken full advantage of websites with individual locking every time I’ve used one, including WordPress sites, LiveJournal&Dreamwidth, Pinboard, and, yes, Flickr, and strongly prefer it.

At the same time, the chance of people who care about me obtaining a login to Flickr, or to social-photos-site-of-the-month in order to view pictures of a party we were at is basically nil, so the ability to share links to photos via Flickr’s guest pass system has made it useful to me for semi-private events and photos.

API access. I’m not locking all this stuff on all these sites down by hand! It’s all scripted and done via APIs.

Multiple albums for a single photo I look at my photos through several different types of, uh, “lenses”. There’s events, there’s individuals in the photos (mostly my children), and there’s my show-off albums for my favourite photos or ones most I’m likely to want to share with other people if only they’d ask to see more of my photos. I use albums for all three ways of looking at photos, and thus many of my photos are in both a “my kid at age 3” album and a “visit to the beach in November” album.

I also use tags and I might be able to modify my workflow to use tags to replace some of these features, although the result of a tag search would need to be viewable as a first class album, rarely true in my experience so far.

Creative Commons licencing. I like easily dropping my photos into a big pool of photos that might someday find good uses elsewhere and licence a lot of my non-portraits CC BY for (nearly) maximum re-usability. I fear that even sites that support CC licencing won’t end up being searched by anyone in practice, and if I note a CC licence myself in the description, it’s never going to happen.

Features I’d reluctantly sacrifice

Chromecast support. It’s been really enchanting having our TVs display great photos of our kids throughout their lives, travel we’ve done, and a lot of clouds, all via Chromecast’s support for using Flickr photos for background images, but I’m willing to give it up for my core set of features.

An app. Don’t get me wrong, I do like being able to peruse my photos on my phone, but I’d give it up if I had to. Because I do about half my photography with a DSLR, and edit essentially all my photographs, I don’t upload photos via apps in any case.

Less important

The social ecosystem. I started using Flickr regularly after a lot of people stopped, and I’m indifferent to the social features, eg favourites, comments, following other folks, putting my photos in group albums. I do use some of these, but I won’t be looking for them in a replacement.

Locking to different sets of people. I do use Flickr’s “friends” and “family” distinction a little, but in giving up social, I’m also happy to give up locking other than “locked” and “not locked”.

And now, I’m afraid, it’s well and truly time to go shopping for a new photo host. My favourite. Only not.

If you’re still maintaining a LiveJournal, your journal’s now in Russia

Signal-boosting this news as I know a few people still maintaining a LiveJournal who might choose to delete it, or change their use of LiveJournal after learning about this.

LiveJournal is now hosted in Russia

As of late December 2016, the LiveJournal servers (computers) are now hosted in Russia. While LiveJournal has been owned by Russian company SUP since 2007, the servers had until now been hosted in the US and access to them somewhat controlled by Californian law.

SUP has, to the best of my knowledge, not announced or commented on this themselves, but there’s more information at rahirah’s Dreamwidth journal with links to different evidence of the new location of the servers.

A Russian-language LiveJournaller appears to report that Russian law then allows that all the confidential information of [LiveJournal] users is available for [Russian] domestic security services in real time [note though that that’s a automated translation].

A BBC report on Russian law regarding social media in 2014 seems to confirm this:

A new law imposing restrictions on users of social media has come into effect in Russia.

It means bloggers with more than 3,000 daily readers must register with the mass media regulator, Roskomnadzor, and conform to the regulations that govern the country’s larger media outlets.

Internet companies will also be required to allow Russian authorities access to users’ information.

Thanks to my original source of information about this (found via @anatsuno on Twitter).

siderea expresses several important concerns with this:

  1. if you’re in Russia or vulnerable to Russia, and a political opponent, you could now be more easily identified by Russian security
  2. regardless of where you are, your LiveJournal could be possibly be deleted without notice for expressing opposition to Russia or its interests or for other content censored in Russia (eg LGBT-related content)
  3. the flight of LiveJournal users from LiveJournal following this news could simply kill the business and cause everyone’s journals to disappear without notice (Archive Team is storing public entries, regarding it as an at-risk site)

Readers’ connections to LiveJournal aren’t private

LiveJournal redirects secure https links back to insecure http. For example, if you visit https://ohnotheydidnt.livejournal.com/ your browser will connect, but it will be instructed to head to http://ohnotheydidnt.livejournal.com/ before loading the page. (Info from this Dreamwidth comment by mme_hardy, confirming my personal experiences with LiveJournal RSS feeds over the last several months.)

What this means is that the content of any entries you read, including locked ones by both you and other people, are trivially visible to anyone who can eavesdrop on your net connection, including (often) other people on your local network, and anyone on the path between you and LiveJournal such as your ISP and anyone with access to the data flowing across international cables or access to the data as it enters the Russian hosting facility, whereas https connections are encrypted in a way such that those people can see that data is flowing but can’t read it absent considerably more niche and intensive technical measures. (Even if HTTPS were turned on by LiveJournal, you wouldn’t be safe from the Russian law, since they can ask LiveJournal itself to turn over your data in addition to whatever nation-state attacker level techniques they can employ.)

Given my experience with LiveJournal RSS feeds, I’m fairly sure this has been true for some time, predating the move of the servers to Russia. (Here’s one other report that this was already true as of September 2016.) Regardless of timing, this speaks of, at best, disregard for the privacy of their users’ explicitly private (because friends-locked!) information. It’s 2017, mandatory HTTPS for transmission of any data that is sensitive or might, conceivably, somehow, maybe, be sensitive is an absolute minimum standard for user safety. LiveJournal doesn’t even have optional “if you have HTTPS Everywhere installed” or “if you remember to stick the s into the URL yourself” HTTPS (which would still be insufficient as you cannot control whether your readers use HTTPS when reading your journal).

Getting your content out of LiveJournal

If based on this you choose to delete your LiveJournal, here are some options to keep your entries. This list isn’t comprehensive.

If you want to move the content to another website, here’s some blogging platforms that provide imports from LiveJournal:

If you want to download your entries for private use, you can:

  • use LiveJournal’s own export tool but rather painfully (you’ll have to do one download per month), and without comments
  • use ljdump on the command line, which worked for me as of 2015 when I deleted my LiveJournal, but will require that you’re an experienced command line user
  • use BlogBooker to export it to a Word or PDF file (disclosure: I haven’t used this site in quite some time, and would appreciate hearing if it works, but I suggest people at least try it because it exports to a non-programmer friendly format that people could keep as a private archive, and claims to include comments and images)
  • Archive Team lists other backup tools

If your LiveJournal made use of their photo hosting, I am not sure which backup solutions will import your photos or how they will be stored. I am also not aware of any import tool that replaces LiveJournal entries with a “this entry has moved to URL” message or similar. If anyone is working on a competing LiveJournal import/export tool, photo export and redirection text are both features that my friends and I would have found useful at various times.

Creative Commons License
If you’re still maintaining a LiveJournal, your journal’s now in Russia by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright hell: larrakins and astrologers

This article originally appeared on Hoyden About Town.

People who support a reasonable balance between encouraging creation of artistic works by allowing creators to profit from them, and the interests of wider society in benefiting from the free availability of creative works (or even of facts) aren’t having a good day.

Larrikin vs Australian Music

Skud has covered this over at Save Aussie Music:

Today EMI Australia lost their High Court appeal against Larrikin Music in the Kookaburra/Land Down Under case…

Leaving aside the problems with the copyright system, let’s just take a moment to look at Larrikin, the folk music label that holds the rights to “Kookaburra”. Larrikin was founded in 1974 by Warren Fahey, and sold to Festival Records in 1995. Festival, owned by Murdoch, was shut down and its assets sold to Warner Music Australia in 2005, for a mere $12 million.

Larrikin was home to a number of Australian artists, among them Kev Carmody, Eric Bogle, and Redgum

Kev Carmody, one of Australia’s foremost indigenous musicians, released four albums on Larrikin and Festival between 1988 and 1995, none of which are available on iTunes nor readily available as CDs (based on a search of online retailers). …

Warner bought Larrikin Records’ assets — two decades of Australian music — not because they want to share the music with the public, but to bolster their intellectual property portfolio, in the hope that one day they’ll be able to sue someone for using a riff or a line of lyrics that sounds somewhat like something Redgum or Kev Carmody once wrote. They do this at the expense of Australian music, history, and culture.

Lauredhel covered the case earlier at Hoyden too, focussing on whether the claim of infringement stands up to a legal layperson’s listen test and musical analysis: You better run, you better take cover.

Astrologers versus software creators and users

Have you ever selected your timezone from a list which lists them like this: “Australia/Sydney”, “Europe/London”? Then you’ve used the zoneinfo database.

Timezones are complicated. You can’t work out what timezone someone is in based purely on their longitude, have a look at this map to see why. Timezones are highly dependent on political boundaries. On top of that, daylight savings transitions are all over the map (as it were). Some countries transition in an unpredictable fashion set by their legislature each year. Sometimes a sufficiently large event (such as the Sydney Olympics in 2000) causes a local daylight savings transition to happen earlier or later than that government’s usually predictable algorithm.

Therefore computer programs rely heavily on having a giant lookup table of timezones and daylight saving transitions. Data is needed both for the present, so that your clock can be updated, and for the past, so that the time of events ranging from blog entries to bank transactions can be correctly reported.

A great deal of software, including almost all open source software, relies on the freely available database variously called the tz database, the zoneinfo database or the Olson database.

Arthur David Olson (the “Olson” in “Olson database”) announced yesterday:

A civil suit was filed on September 30 in federal court in Boston; I’m a defendant; the case involves the time zone database.

The ftp server at elsie.nci.nih.gov has been shut down.

The mailing list will be shut down after this message.

The basis of the suit is that the zoneinfo database credits The American Atlas as a source of data, and The American Atlas has been purchased by astrology company Astrolabe Inc, who assert that the use of the data is an infringement of their copyright. Whether this is true is apparently highly arguable (in the US it seems to hinge on whether it’s a list of facts, which aren’t copyrightable) but in the meantime the central distribution point of the data is gone. And it could be a long meantime.

Now, people still have copies of the database (if you run Linux you probably do yourself). However, the source of updates has been removed, which means it will be out of date within a few weeks, and the community that created the updates has been fractured. Various people are doing various things, including a defence fund, a fork of the mailing list, and discussions about re-creating or resurrecting the data in other places. All a great waste of many creative people’s time and money, gain to society from Astrolabe’s action yet to be shown.

More information:

Update (Oct 17): ICANN takes over zoneinfo database

On 14th October the Internet Corporation for Assigned Names and Numbers (ICANN), which manages key Internet resources (notably, the global pool of IPv4 and IPv6 addresses) on behalf of the US government, put out a press release (PDF) announcing that they were taking over the zoneinfo database:

The Internet Corporation for Assigned Names and Numbers (ICANN) today took over operation of an Internet Time Zone Database that is used by a number of major computer systems.

ICANN agreed to manage the database after receiving a request from the Internet Engineering Task Force (IETF).

The database contains time zone code and data that computer programs and operating systems such as Unix, Linux, Java, and Oracle rely on to determine the correct time for a given location. Modifications to the database occur frequently throughout the year…

“The Time Zone Database provides an essential service on the Internet and keeping it operational falls within ICANN’s mission of maintaining a stable and dependable Internet,” said Akram Atallah, ICANN’s Chief Operating Officer.

I wonder if ICANN’s not-for-profit status is useful here. Just as Project Gutenberg can make United States public domain texts available globally, even though texts published prior to 1923 are not public domain world-wide, ICANN may present a less tempting target for lawsuits than other possible homes for the zoneinfo database.

Self-hosted photos, line, end of

It’s quite probably that sometime in the next few days I will hand my public photo hosting entirely over to Flickr or Picasa, quite belatedly compared to most people I know. I wanted to document some self-hosting problems:

  • Self-hosted photos are isolated, because there’s no subscription standard for photos. RSS and so on aren’t ideal because each application decides how to embed the thumbnail and so on in the content, meaning that you can’t cleanly aggregate images from all different software for easy viewing. I really think this is a big deal in uptake of self-hosting: you can’t let people pull your new photos into their own app of choice for viewing. (Same problem for blog comments, by the way, although I realise people would then want a ‘make a comment’ API and the spamming gets even worse.)
  • Gallery 2, which I started using in 2007, has been end-of-lifed in favour of Gallery 3, which is effectively a new project. Gallery 3 is still under-themed (and all of the themes look like a big ad for Gallery 3) and its new upload API isn’t supported by local clients yet. Plus, importing and re-linking everywhere (sure, they support redirects… if you weren’t using clean URLs and never hotlinked the thumbnails anywhere) is enough hassle that I’m not willing to do it every three years, and a project that has had a clean re-write twice is likely to do it again.
  • None of the big free gallery software projects (Gallery 3, Zenphoto, Plogger, probably Coppermine but I haven’t looked) support importing from the others well. You’re supposed to start afresh, or use a Perl script that has the stability status of "worked for me, more or less."
  • If you do switch, there’s terrible support for HTTP redirects of either photo pages or thumbnail hotlinks.
  • The usual web-app sysadmin problems, in which you’re supposed to make everything world-writable, and you upgrade by downloading a ZIP and unpacking it and opening a special file, and just copying a few things, and and and.
  • If photo locking is available at all, it’s way complicated, it requires people subscribe to your (well, my) dinky little photo site, there are no guest passes, etc etc. This is important to me as the parent of a baby.
  • They are feature-chasing, not leaders in features, especially usability.

I stopped enjoying doing this kind of thing for fun many years back. (You know what software thing I’m looking forward to? Learning R. You know what doesn’t resemble learning R? Writing web apps.)

PS: if you email me to suggest that I try self-hosting apps I haven’t tried, I very likely will not try it for lack of time. You would need to put serious work into the sales pitch, up to and including describing a great workflow for my particular needs and offering to migrate my database for me. I could be surprised, I guess.

larch: migrating mail between IMAP accounts

I recently had to move several gigabytes of email (not my own, work-related) into Google Apps (Gmail). As best I can tell, the way most people do this is that they grit their teeth and they open up a graphical email client and drag folders one-by-one. It’s a one-off job for most people.

There were a couple of reasons I didn’t want to do that. One is that I was on my parents’ DSL connection at the time and pushing gigabytes of data through someone’s DSL is a violation of good guest principles, at least in Australia. The other is that we have over 500 folders in the account I’m talking about: that’s a lot of mouse pain.

Anyway, here’s your answer, if you are in the same position as me. After substantial searching, at least for this kind of tool, I came across larch, which is a Ruby command-line tool for IMAP-to-IMAP moves, most tested on Gmail and essentially designed for the “move my mail archives into Gmail” use-case. It’s much more mature than most of the one-off scripts people have thrown up on the ‘net. It certainly seemed robust over this volume of mail, although I did have to run it a couple of times to get past a few errors (it does not re-copy already copied mail, so re-runs are fast). It deserves more search juice.

If you wanted to keep two accounts permanently in sync, offlineimap would be the tool of choice, although the manual still seems to regard IMAP-to-IMAP syncing as not as robustly tested as its core mode of operation, which is IMAP-to-Maildir.

Viewing attachments when using mutt remotely

Yes, that’s right, I’m still in the dark ages and do not yet use Gmail for my email. Even though it has IMAP and everything. I still use Mutt.

I almost always use Mutt locally, using offlineimap to sync IMAP folders to local maildirs. This means I don’t usually have the problem of being unable to view non-text attachments. However, for the next little while I’ll be using Mutt on a remote connection.

Don Marti has one solution to this, which assumes that you are accessing the server with Mutt on it via SSH (probably true) and are easily able to create a tunnel to your local machine, which is trivial if you are using a commandline ssh client, but while you can do it with PuTTY I figured it was just annoying enough that I might not bother. (And I doubt you can do it at all with those web-based SSH clients.)

My alternative assumes instead that you have a webserver on the remote machine that has mutt on it. It then just copies the attachment to a web-accessible directory, and tells you the URL where you’ll be able to find the attachment. It’s thus a very trivial script (and I doubt very much it’s the only one out there), but perhaps using mine might save you fifteen minutes over coming up with your own, so here it is:

copy-to-dir.sh (in a bzr git repo)

Sample output is along these lines when you try to view an attachment in Mutt:

View attachment SOMEPDF.pdf at http://example.com/~user/SOMEPDF.pdf Press any key to continue, and delete viewable attachment

In order to use it, you need to:

  1. copy the script to the remote machine where you use mutt;
  2. make it executable;
  3. edit it to set the OUTPUTDIR and VIEWINGDIR variables to something that works for your setup;
  4. set up a custom mailcap file much like the one Don Marti gives, that is, put something like this in your ~/.mutt-mailcap:
     text/*; PATH-TO-SCRIPT/copy-to-dir.sh %s application/*; PATH-TO-SCRIPT/copy-to-dir.sh %s image/*; PATH-TO-SCRIPT/copy-to-dir.sh %s audio/*; PATH-TO-SCRIPT/copy-to-dir.sh %s 
  5. set mailcap_path = ~/.mutt-mailcap in your ~/.muttrc file.

Something like this probably could work for Pine and other text-based email clients used remotely too, but I’m not sure howi because I don’t use them. And if someone wants to document this in a way that assumes less pre-existing knowledge, go ahead.

Also, making your attachments web-accessible means that they are, well, web-accessible. I’ve set up a HTTP Auth-protected directory using https for this, you should think about your own setup too.

Creative Commons License
Viewing attachments when using mutt remotely by Mary Gardiner is licensed under a Creative Commons Attribution 4.0 International License.