Don’t trust Livejournal? Delete your account.

In December 2016, LiveJournal moved their servers to Russia and in April 2017 they updated their terms of use in a way many users find unacceptable.

In January 2017, as people considered the implications of the server move to Russia, I saw a number of people hesitant to delete their accounts as they were hoping to overwrite their data on Livejournal before deleting, by, eg, replacing their entries with Shakespeare’s plays, or with random nonsense, so that Livejournal didn’t have the entry any more. This won’t work and you might as well just delete your Livejournal account.

Here’s a loose analogy for the way that data on a site like Livejournal may be stored:

There’s a journalling website. It stores its entries on vast reams of paper in a giant library and new entries are scribed onto paper and filed.

The “overwrite with nonsense” strategy assumes that any journal entry you make is at a fixed location on a fixed bit of paper for all time. When you update the entry, the scribe goes to the existing bits of paper and writes on top of them. While this is technically possible with hard drives and similar, in a way that it isn’t with literal paper, here’s what more likely actually happens:

You update the entry, replacing it a Shakespearean play. The new version is written on entirely random empty paper (maybe blank, maybe where someone else’s deleted entry once was), and an index in a different part of the library is also updated. It used to say that your entry of January 7 was on floor 6, shelf 216, and now it says that your entry of January 7 was on floor 12, shelf 16.

But the contents of floor 6, shelf 216 are likely not overwritten for some time. Perhaps they’re marked as available to be overwritten, to be reused whenever it seems sensible, but you won’t know when that is. On the other hand, perhaps they are deliberately marked in the index as “an old version of the January 7 entry” for the sake of users having an edit history, or to have an audit trail, or because a lawsuit demands it, or because a government demands it. This may or may not be visible to you.

Even if floor 6, shelf 216 is marked available to be overwritten, it may not be actively erased, and if it isn’t actively erased, it’s available to be searched by a sufficiently determined or empowered person. (And searching unindexed digital storage is a lot faster and cheaper than searching paper, so not one thousandth as determined or empowered as you need to be to search a library full of unindexed paper.)

And even if floor 6, shelf 216 is no longer marked as “an old version of the entry of January 7”, on any moderately well-run website, floor 6, shelf 216 was never the only copy of your entry anyway. What if there was an accident with fire or water or whiteout? There are backups of your entry, probably at least two in the same library and at least one in a different library. These backups are usually moments in time, ie, the state of the entire journalling website as of New Years. The state of the entire journalling website as of New Years the previous year.

These backups are almost certainly never wiped of entries that are simply edited, and without adding a system that searches back through backups and selectively deletes backups of deleted accounts, they most likely contain the complete contents of deleted accounts as well.

So what you’ve ended up with is a situation where floor 12, shelf 16 contains a Shakespearean play, floor 6, shelf 216 likely contains your original entry, and there are several backups around that almost certainly contain your original entry and are designed in such a way as to be found and restored relatively quickly. This is not a much more secure situation than before you replaced the entry with a Shakespearean play; probably not worth the work you did.

All that said, it’s important to know that there are trade-offs in adding secure, permanent deletion. People quite often edit or delete their data accidentally, or temporarily — for example it is quite common to disable social media accounts temporarily to enforce a social media break — and it’s also common to be hacked and have your data deleted by the hacker. Enthusiastic data scubbing will actively harm you in all these cases. On top of that, storage systems fail (in my analogy, the library burns down, except hard drives fail more often than paper does), and backups are especially important then. And any system that goes back in time and edit backups has risks; what if it has a bug (all software has bugs) and deletes things it shouldn’t? System design to balance securely deleting data that users want to permanently delete with rarely or never deleting data they expect to keep is not easy.

So Livejournal or another site has your personal data, what should you do? I suggest that when you no longer use an online service, or you no longer trust in its management, that you take a personal backup of the data if possible and if you want it, and then delete your account.

You cannot usefully take any additional steps like overwriting your account with nonsense to ensure that actual data scrubbing took place and you should assume that it wasn’t scrubbed unless you can find some written guarantee otherwise. However, over time, backups will get selectively pruned, outages will happen, the business may eventually fail and your data will most likely become slowly less available and complete. That’s the best you can do.

For online services you actively use and where you do trust the management enough to keep your account, ask for written descriptions of their data scrubbing practices to be developed for deleted data and deleted accounts, including deletion from backups and handling of disused hard drives.

Elsewhere:

Tim Chevalier, PSA: Switching to Dreamwidth? (January 2017).

Disclosure: I am an employee of Google. This post does not describe Google’s data deletion practices, in which I’m not an expert in any case; it’s a general description of easy, sometimes harmful, defaults that systems designers could fall into. For Google-specific information, you can view privacy.google.com and Google Infrastructure Security Design Overview.

Creative Commons License
Don’t trust Livejournal? Delete your account. by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Flickr features I’ve known and loved

As the planning for the sale of Yahoo!/Altaba to Verizon continues, I’m not the only person worried about the fate of Flickr, which has been owned by Yahoo since 2005:

I’ve got a tediously backed-up local copy of my photos and won’t have to kiss them goodbye, but as a happy Pro user of Flickr I’m really worried about its future and beginning an active search for replacements. I’m going to start evaluating possible replacements on the basis of these specific features, roughly in order of importance:

My favourite Flickr features

Embedding my foremost use of Flickr is as a photo host for my parenting blog and, increasingly, to show off my best photos. The ability to embed photographs in third-party websites is essential to me.

Locking at the photo level and guest access. It’s not easy to find non-recent photographs of my children on my Flickr account. That’s because I have a script that marks photos as private once they’re a certain age. Some other types of photos (for example, photos of other children) I often mark as private immediately.

Much of my web life runs this way: just because you can find my recent stuff doesn’t mean you get to casually browse everything I’ve done on the Internet since the beginning of time (circa 1999). I’ve taken full advantage of websites with individual locking every time I’ve used one, including WordPress sites, LiveJournal&Dreamwidth, Pinboard, and, yes, Flickr, and strongly prefer it.

At the same time, the chance of people who care about me obtaining a login to Flickr, or to social-photos-site-of-the-month in order to view pictures of a party we were at is basically nil, so the ability to share links to photos via Flickr’s guest pass system has made it useful to me for semi-private events and photos.

API access. I’m not locking all this stuff on all these sites down by hand! It’s all scripted and done via APIs.

Multiple albums for a single photo I look at my photos through several different types of, uh, “lenses”. There’s events, there’s individuals in the photos (mostly my children), and there’s my show-off albums for my favourite photos or ones most I’m likely to want to share with other people if only they’d ask to see more of my photos. I use albums for all three ways of looking at photos, and thus many of my photos are in both a “my kid at age 3” album and a “visit to the beach in November” album.

I also use tags and I might be able to modify my workflow to use tags to replace some of these features, although the result of a tag search would need to be viewable as a first class album, rarely true in my experience so far.

Creative Commons licencing. I like easily dropping my photos into a big pool of photos that might someday find good uses elsewhere and licence a lot of my non-portraits CC BY for (nearly) maximum re-usability. I fear that even sites that support CC licencing won’t end up being searched by anyone in practice, and if I note a CC licence myself in the description, it’s never going to happen.

Features I’d reluctantly sacrifice

Chromecast support. It’s been really enchanting having our TVs display great photos of our kids throughout their lives, travel we’ve done, and a lot of clouds, all via Chromecast’s support for using Flickr photos for background images, but I’m willing to give it up for my core set of features.

An app. Don’t get me wrong, I do like being able to peruse my photos on my phone, but I’d give it up if I had to. Because I do about half my photography with a DSLR, and edit essentially all my photographs, I don’t upload photos via apps in any case.

Less important

The social ecosystem. I started using Flickr regularly after a lot of people stopped, and I’m indifferent to the social features, eg favourites, comments, following other folks, putting my photos in group albums. I do use some of these, but I won’t be looking for them in a replacement.

Locking to different sets of people. I do use Flickr’s “friends” and “family” distinction a little, but in giving up social, I’m also happy to give up locking other than “locked” and “not locked”.

And now, I’m afraid, it’s well and truly time to go shopping for a new photo host. My favourite. Only not.

If you’re still maintaining a LiveJournal, your journal’s now in Russia

Signal-boosting this news as I know a few people still maintaining a LiveJournal who might choose to delete it, or change their use of LiveJournal after learning about this.

LiveJournal is now hosted in Russia

As of late December 2016, the LiveJournal servers (computers) are now hosted in Russia. While LiveJournal has been owned by Russian company SUP since 2007, the servers had until now been hosted in the US and access to them somewhat controlled by Californian law.

SUP has, to the best of my knowledge, not announced or commented on this themselves, but there’s more information at rahirah’s Dreamwidth journal with links to different evidence of the new location of the servers.

A Russian-language LiveJournaller appears to report that Russian law then allows that all the confidential information of [LiveJournal] users is available for [Russian] domestic security services in real time [note though that that’s a automated translation].

A BBC report on Russian law regarding social media in 2014 seems to confirm this:

A new law imposing restrictions on users of social media has come into effect in Russia.

It means bloggers with more than 3,000 daily readers must register with the mass media regulator, Roskomnadzor, and conform to the regulations that govern the country’s larger media outlets.

Internet companies will also be required to allow Russian authorities access to users’ information.

Thanks to my original source of information about this (found via @anatsuno on Twitter).

siderea expresses several important concerns with this:

  1. if you’re in Russia or vulnerable to Russia, and a political opponent, you could now be more easily identified by Russian security
  2. regardless of where you are, your LiveJournal could be possibly be deleted without notice for expressing opposition to Russia or its interests or for other content censored in Russia (eg LGBT-related content)
  3. the flight of LiveJournal users from LiveJournal following this news could simply kill the business and cause everyone’s journals to disappear without notice (Archive Team is storing public entries, regarding it as an at-risk site)

Readers’ connections to LiveJournal aren’t private

LiveJournal redirects secure https links back to insecure http. For example, if you visit https://ohnotheydidnt.livejournal.com/ your browser will connect, but it will be instructed to head to http://ohnotheydidnt.livejournal.com/ before loading the page. (Info from this Dreamwidth comment by mme_hardy, confirming my personal experiences with LiveJournal RSS feeds over the last several months.)

What this means is that the content of any entries you read, including locked ones by both you and other people, are trivially visible to anyone who can eavesdrop on your net connection, including (often) other people on your local network, and anyone on the path between you and LiveJournal such as your ISP and anyone with access to the data flowing across international cables or access to the data as it enters the Russian hosting facility, whereas https connections are encrypted in a way such that those people can see that data is flowing but can’t read it absent considerably more niche and intensive technical measures. (Even if HTTPS were turned on by LiveJournal, you wouldn’t be safe from the Russian law, since they can ask LiveJournal itself to turn over your data in addition to whatever nation-state attacker level techniques they can employ.)

Given my experience with LiveJournal RSS feeds, I’m fairly sure this has been true for some time, predating the move of the servers to Russia. (Here’s one other report that this was already true as of September 2016.) Regardless of timing, this speaks of, at best, disregard for the privacy of their users’ explicitly private (because friends-locked!) information. It’s 2017, mandatory HTTPS for transmission of any data that is sensitive or might, conceivably, somehow, maybe, be sensitive is an absolute minimum standard for user safety. LiveJournal doesn’t even have optional “if you have HTTPS Everywhere installed” or “if you remember to stick the s into the URL yourself” HTTPS (which would still be insufficient as you cannot control whether your readers use HTTPS when reading your journal).

Getting your content out of LiveJournal

If based on this you choose to delete your LiveJournal, here are some options to keep your entries. This list isn’t comprehensive.

If you want to move the content to another website, here’s some blogging platforms that provide imports from LiveJournal:

If you want to download your entries for private use, you can:

  • use LiveJournal’s own export tool but rather painfully (you’ll have to do one download per month), and without comments
  • use ljdump on the command line, which worked for me as of 2015 when I deleted my LiveJournal, but will require that you’re an experienced command line user
  • use BlogBooker to export it to a Word or PDF file (disclosure: I haven’t used this site in quite some time, and would appreciate hearing if it works, but I suggest people at least try it because it exports to a non-programmer friendly format that people could keep as a private archive, and claims to include comments and images)
  • Archive Team lists other backup tools

If your LiveJournal made use of their photo hosting, I am not sure which backup solutions will import your photos or how they will be stored. I am also not aware of any import tool that replaces LiveJournal entries with a “this entry has moved to URL” message or similar. If anyone is working on a competing LiveJournal import/export tool, photo export and redirection text are both features that my friends and I would have found useful at various times.

Creative Commons License
If you’re still maintaining a LiveJournal, your journal’s now in Russia by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Make your Dreamhost site HTTPS-only

Encrypt all the traffic!

Some of the archival Ada Initiative web content is hosted on Dreamhost, and today I re-enabled HTTPS for it now that Let’s Encrypt certificates are available both on Dreamhost and WordPress.com.

Update October 2017: official Dreamhost documentation on adding an SSL certificate and forcing your site to load securely with an .htaccess file is available.

Here’s how to enable, and insist on, HTTPS connections to sites hosted on Dreamhost:

  1. Log into the panel
  2. Go to Secure Hosting
  3. Click ‘Add Secure Hosting’
  4. Select the domain you want from the dropdown, check the box next to ‘By checking this option you agree to the Let’s Encrypt Terms of Service.’, leave ‘Unique IP’ unchecked, and press ‘Add now’.
  5. Important: wait for an email from Dreamhost telling you the certificate is ready. This seems to take about fifteen minutes or so. The email contains a copy of the certificate but you don’t need to do anything with it, they configure the webserver automatically at about the same time as they send the email.
  6. Once you have received the email, check that your site is available at https://YOUR-URL and that your browswer does not report errors. (If it does, wait around 15 minutes, try again, and if you’re still seeing errors, screenshot them and contact Dreamhost support.)

Now that HTTPS is working on your site, you can then force all HTTP requests to redirect to HTTPS by placing this in the ~/YOUR-URL/.htaccess file:


<IfModule mod_rewrite.c>
# Redirect all insecure requests
RewriteCond %{HTTPS} !=on
RewriteRule ^/?(.*) https://%{SERVER_NAME}/$1 [R=301,L]
</IfModule>

# tell web browsers not to even try HTTP on this domain for the next year
# uncomment this only after you've confirmed your site is working on HTTPS, since it commits you to providing it over HTTPS
# Header set Strict-Transport-Security "max-age=31536000" env=HTTPS

Check that visiting http://YOUR-URL now redirects to https://YOUR-URL, and the same should be true of pages underneath http://YOUR-URL.

Feature request for Dreamhost: make a “force HTTPS” option in your standard config.

For more information on Strict-Transport-Security see HTTP Strict Transport Security for Apache, NGINX and Lighttpd and Stack Overflow: How to set HSTS header from .htaccess only on HTTPS.

Bonus round: update absolute links

If your site is a bunch of static HTML files, and you have done a lot of absolute linking to your own webpages, here’s a possible command you could run, replacing example.com with your own domain. I don’t recommend running it unless you know the UNIX command line, and you have a fairly good idea of what find and sed both do:


DOMAIN=example.com
cp -a ~/$DOMAIN ~/$DOMAIN-backup-before-https-edit
cd ~/$DOMAIN
find -type f -name "*.html" -exec sed -i "s/http:\/\/$DOMAIN/https:\/\/$DOMAIN/g" {} \;

Creative Commons License
Make your Dreamhost site HTTPS-only by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Unhappy data retention day

This article originally appeared on Hoyden About Town.

This morning, Australia’s mandatory 2 year data retention regime began. Internet activity through Australian ISPs (including mobile phone providers) is now recorded. Australians, according to Crikey, here is what is likely to be retained about your accessing this link today:

  • your name and similar identifying details on your Internet account
  • the Internet address of where you accessed Hoyden About Town from
  • the Internet address of Hoyden About Town itself
  • the date and time you accessed this site
  • how long you accessed it for (quickly, in the case of websites, no doubt, but what if you were Skyping with us?)
  • what technical services you used (HTTP over ADSL or mobile or cable or …)

If you are accessing this over a mobile device, your location is also stored, to quite a high degree of accuracy. This data is also by far the hardest to conceal using any method, since it’s revealed as a core part of your phone’s communication with cell towers.

At least the actual specific page you accessed would not (or at least need not) be retained, if I am interpreting the information at Allens and Crikey correctly.

Surveillance cameras attached to a building exterior
Surveillance, by Jonathan McIntosh CC BY-SA

Further reading:


Image credit: Surveillance by Jonathan McIntosh, Creative Commons Attribution-Sharealike.

Quick links: nothing to hide

This article originally appeared on Hoyden About Town.

Data retention is coming to Australia very soon.

[Data retained] includes your name, address and other identifying information, your contract details, billing and payment information. In relation to each communication, it includes the date, start and finish times, and the identities of the other parties to the communication. And it includes the location data, such as the mobile cell towers or Wi-Fi hotspots you were accessing at the time…

But surely they’ve included special protections for communications between doctors and patients, and lawyers and clients? No. Never even discussed…

The Joint Committee recommended that the Act be amended to ensure that the metadata can’t be obtained by parties in civil litigation cases (I’ve mentioned before how excited litigation lawyers will be about all this lovely new data), and George Brandis said that would be fixed in the final amendments. But it isn’t there. The final Bill being bulldozed through Parliament right now contains no such protection. The fact remains that, under the Telecommunications Act, one of the situations in which a service provider cannot resist handing over stored data is when a court has required it by issuing a subpoena. In practice, that means that your ex-spouse, former business partners, suspicious insurance company or employer can get hold of a complete digital history of your movements and communications for the past two years, and use it against you in court.

Michael Bradley, Our privacy is about to be serially infringed, The Drum, March 19 2015

Surveillance cameras attached to a building exterior
Surveillance, by Jonathan McIntosh@Flickr CC BY-SA

Noted elsewhere: all this data will be stored by various companies with varying degrees of security awareness, so in practice it will sometimes be available to some criminals too.

Elsewhere:


Image credit: Surveillance by Jonathan McIntosh, Creative Commons Atttribution-Sharealike

Importing a large blog to WordPress.com: WXR splitting tools

I am about to import a very large WordPress blog (not this one) to WordPress.com.

There’s two issues:

1. The WXR (WordPress eXtended RSS) export from the site is 105MB uncompressed and 22MB compressed (with gzip -9). This is too large to upload to WordPress.com, which only accepts uploads of 15MB at most.

2. This site has 4000 media file uploads (and 6000 posts). The original host is going away: those 4000 media files (mostly images) must also be imported into WordPress.com.

The obvious solution to is to split the upload into multiple files, but I have just tested on WordPress.com, and in order to get it to change the post contents to refer to the imported copy of the media files, rather than the original externally hosted copy which is about to go away, the media file and the post must be uploaded in the same XML file. The scripts that I’ve found that will split WXR files into multiple XML files do not attempt to put media files and the posts that refer to them in the same XML file (eg mainSplit.py doesn’t do this), they just split the contents of the export file up in the order they appear.

Anyone got leads on this one?

Connecting a Debian/Ubuntu server to the Macquarie University OneNetAnywhere VPN

I realise that this is a rather specific problem, but hopefully the links I provide here will be useful for anyone wanting to access a PPTP VPN themselves.

I have to say that this is one of those entries more likely to be useful if you ever have this specific problem (eg, you can here via a search engine query for “argh pptp mppe errors argh argh argh”) and less for a casual reader. Apologies loyal fans!

Continue reading “Connecting a Debian/Ubuntu server to the Macquarie University OneNetAnywhere VPN”

IPv6: encore

In which Mary does a lot of work on a comments policy in order to talk to herself about IPv6. True story.

Anyway, where we last left our heroine, she had found one unpromising (because unanswered) complaint describing her IPv6 problem. She tried updating the router firmware but it said it was the newest available firmware.

Some time later, our heroine found another account of the problem over on Server Fault where it was less likely to be lonely, and our heroine became convinced that she ought to install DD-WRT on her wireless router. Hey, maybe it would have worked, too. But our heroine’s husband likes his Internet to work, and gave her a sidelong look, whereupon our heroine at least deferring bricking her router until the weekend.

However! Our heroine is slightly bored of one of her day jobs, so today she idly searched for updated firmware and updated her D-Link DIR-615 router (C2 hardware edition) from firmware version 3.01 to 3.03WW (WW? I don’t get it either) and now she has a wireless router that does not send rogue IPv6 router advertisements to the network.

The end.

IPv6: finale in the key of D-Link

Background knowledge: this post requires some knowledge of networking, at least to the point of knowing what IPv4 and IPv6 are, and what is meant by subnet notation like “/60” and “/64”.

I believed for a very brief time that I’d beaten IPv6 into shape but soon my husband started complaining that sometimes it worked, sometimes it didn’t, and basically questioning whether it was worth any more late nights. (I would poke things, we would jointly debug them, IPv6 involved us skipping dinner two nights in a row in the end.)

Basically what would happen was that anything we tried to connect to over IPv6, most noticeably Google itself (because they trust Internode’s IPv6 routing enough to have turned on IPv6 access for their customers) would either work or just hang. I vaguely suspected some kind of routing error.

Here’s something to try if you have mysterious intermittent IPv6 dropouts or hangs: watch the output of radvdump closely. What you are looking for is any router advertisements coming from a second source: rogue RAs was the search term I was using somewhat in vain.

Unfortunately, if you find such a thing, there are essentially two options (much as you do if someone has put a rogue DHCP server on a network). One is to remove the rogue device from the network, the other is to firewall its announcements away from your clients. Unfortunate in my case, that is, because it emerged that the source of the announcements was our D-Link wireless router (which, per the previous entry, we run as a switch). Removing a wireless switch from our network would have the unacceptable side-effect of re-introducing strings of blue cable to our home, and it’s pretty hard to firewall your switch itself. So in our case, the answer for the present time is to give up on home IPv6.

Overall, although the reason we gave up on IPv6 was not a Linux problem, I have to say that I was really surprised how immature Linux’s tools are at this point. The fundamentals exist: kernel support, DHCPv6 and stateless configuration servers and clients. As an IPv6 client, Linux is doing OK. If you connect a Linux machine to a network that happens to be using IPv6, it’ll likely Just Work. But at the tools and packaging level there’s still loads of gaps along the lines of:

  • iptables and ip6tables are entirely separate programs, so you get to have your firewall configuration fun twice! (However, UFW handles this fairly nicely, if you’re in the market for a thin-ish wrapper around iptables.)
  • configuring ppp for IPv6 is like ppp for IPv4 circa 1999 or 2000 or so. Things like the “oh yeah, for a reason no one knows, you won’t get a default route, so here’s a little script that will bring one up for you” (see Shane Short’s blog entry)
  • radvd is a fairly crucial tool, but there aren’t a lot of example config files for different situations that I could find, and the man page assumes that you know a lot about router advertisements already
  • if you want to use Ubuntu’s supported DHCP server (isc-dhcp-server) for DHCPv6, you need to write it a second init script and config file yourself

So after all that you might be tempted to use a dedicated router for IPv6 and I’d sympathise except that the D-Link device does it even worse than Linux. Not promising. I can’t see that moving many ADSL users over to IPv6 is going to happen any time soon.