In January 2017, as people considered the implications of the server move to Russia, I saw a number of people hesitant to delete their accounts as they were hoping to overwrite their data on Livejournal before deleting, by, eg, replacing their entries with Shakespeare’s plays, or with random nonsense, so that Livejournal didn’t have the entry any more. This won’t work and you might as well just delete your Livejournal account.
Here’s a loose analogy for the way that data on a site like Livejournal may be stored:
There’s a journalling website. It stores its entries on vast reams of paper in a giant library and new entries are scribed onto paper and filed.
The “overwrite with nonsense” strategy assumes that any journal entry you make is at a fixed location on a fixed bit of paper for all time. When you update the entry, the scribe goes to the existing bits of paper and writes on top of them. While this is technically possible with hard drives and similar, in a way that it isn’t with literal paper, here’s what more likely actually happens:
You update the entry, replacing it a Shakespearean play. The new version is written on entirely random empty paper (maybe blank, maybe where someone else’s deleted entry once was), and an index in a different part of the library is also updated. It used to say that your entry of January 7 was on floor 6, shelf 216, and now it says that your entry of January 7 was on floor 12, shelf 16.
But the contents of floor 6, shelf 216 are likely not overwritten for some time. Perhaps they’re marked as available to be overwritten, to be reused whenever it seems sensible, but you won’t know when that is. On the other hand, perhaps they are deliberately marked in the index as “an old version of the January 7 entry” for the sake of users having an edit history, or to have an audit trail, or because a lawsuit demands it, or because a government demands it. This may or may not be visible to you.
Even if floor 6, shelf 216 is marked available to be overwritten, it may not be actively erased, and if it isn’t actively erased, it’s available to be searched by a sufficiently determined or empowered person. (And searching unindexed digital storage is a lot faster and cheaper than searching paper, so not one thousandth as determined or empowered as you need to be to search a library full of unindexed paper.)
And even if floor 6, shelf 216 is no longer marked as “an old version of the entry of January 7”, on any moderately well-run website, floor 6, shelf 216 was never the only copy of your entry anyway. What if there was an accident with fire or water or whiteout? There are backups of your entry, probably at least two in the same library and at least one in a different library. These backups are usually moments in time, ie, the state of the entire journalling website as of New Years. The state of the entire journalling website as of New Years the previous year.
These backups are almost certainly never wiped of entries that are simply edited, and without adding a system that searches back through backups and selectively deletes backups of deleted accounts, they most likely contain the complete contents of deleted accounts as well.
So what you’ve ended up with is a situation where floor 12, shelf 16 contains a Shakespearean play, floor 6, shelf 216 likely contains your original entry, and there are several backups around that almost certainly contain your original entry and are designed in such a way as to be found and restored relatively quickly. This is not a much more secure situation than before you replaced the entry with a Shakespearean play; probably not worth the work you did.
All that said, it’s important to know that there are trade-offs in adding secure, permanent deletion. People quite often edit or delete their data accidentally, or temporarily — for example it is quite common to disable social media accounts temporarily to enforce a social media break — and it’s also common to be hacked and have your data deleted by the hacker. Enthusiastic data scubbing will actively harm you in all these cases. On top of that, storage systems fail (in my analogy, the library burns down, except hard drives fail more often than paper does), and backups are especially important then. And any system that goes back in time and edit backups has risks; what if it has a bug (all software has bugs) and deletes things it shouldn’t? System design to balance securely deleting data that users want to permanently delete with rarely or never deleting data they expect to keep is not easy.
So Livejournal or another site has your personal data, what should you do? I suggest that when you no longer use an online service, or you no longer trust in its management, that you take a personal backup of the data if possible and if you want it, and then delete your account.
You cannot usefully take any additional steps like overwriting your account with nonsense to ensure that actual data scrubbing took place and you should assume that it wasn’t scrubbed unless you can find some written guarantee otherwise. However, over time, backups will get selectively pruned, outages will happen, the business may eventually fail and your data will most likely become slowly less available and complete. That’s the best you can do.
For online services you actively use and where you do trust the management enough to keep your account, ask for written descriptions of their data scrubbing practices to be developed for deleted data and deleted accounts, including deletion from backups and handling of disused hard drives.
Tim Chevalier, PSA: Switching to Dreamwidth? (January 2017).
Disclosure: I am an employee of Google. This post does not describe Google’s data deletion practices, in which I’m not an expert in any case; it’s a general description of easy, sometimes harmful, defaults that systems designers could fall into. For Google-specific information, you can view privacy.google.com and Google Infrastructure Security Design Overview.
Don’t trust Livejournal? Delete your account. by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.