Archives

Categories

5 Principles of Backup Software

Everyone agrees that backups are generally a good thing. But it seems that there is a lot less agreement about how backups should work. Here is a list of 5 principles of backup software that seem to get ignored most of the time:

(1/5) Backups should not be Application Specific

It’s quite reasonable for people to want to extract data from a backup on a different platform. Maybe someone will want to extract data a few decades after the platform becomes obsolete. I believe that vendors of backup software have an ethical obligation to make it possible for customers to get their data out with minimal effort regardless of the circumstances.

Often when writing a backup application there will be good reasons for not using the existing formats for data storage (tar, cpio, zip, etc). But ideally any data store which involves something conceptually similar to a collection of files in one larger file will use one of those formats. There have been backward compatible extensions to tar and zip for SE Linux contexts and for OS/2 EAs – the possibility of extending archive file formats with no consequence other than warnings on extraction with an unpatched utility has been demonstrated.

For a backup which doesn’t involve source files (EG the contents of some sort of database) then it should be in a format that can be easily understood and parsed. Well designed XML is generally a reasonable option. Generally the format should involve plain text that is readable and easy to understand which is optionally compressed with a common compression utility (pkzip is a reasonable choice).

(2/5) Data Store Formats should be Published

For every data store there should be public documentation about it’s format to allow future developers to write support for it. It really isn’t difficult to release some commented header files so that people can easily determine the data structures. This includes all data stores including databases and filesystems. If I suddenly find myself with a 15yo image of a NTFS filesystem containing a proprietary database I should be able to find official header files for the version of NTFS and the database server in question so I can decode the data if it’s important enough.

When an application vendor hides the data formats it gives the risk of substantial data loss at some future time. Imposing such risk on customers to try and prevent them from migrating to a rival product is unethical.

(3/5) Backups should be forward and backward compatible

It is entirely unreasonable for a vendor to demand that all their users install the latest versions of their software. There are lots of good reasons for not upgrading which includes hardware not supporting new versions of the OS, lack of Internet access to perform the upgrade, application compatibility, and just liking the way the old version works. Even for the case of a critical security fix it should be possible to restore data without applying the fix.

For any pair of versions of software that are only separated by a few versions it should be possible to backup data from one and restore to the other. Even if the data can’t be used directly (EG a backup of AMD64 programs that is restored on an i386 system) it should still be accessible. If a new version of the software doesn’t support the ancient file formats then it should be possible for the users to get a slightly older version which talks to both the old and new versions.

Backups made on 64bit systems running the latest development version of Linux and on 10yo 32bit proprietary Unix systems are interchangeable. Admittedly Unix is really good at preserving file format compatibility, but there is no technical reason why other systems can’t do the same. Source code to cpio, tar, and gnuzip, is freely available!

Apple TimeMachine fails badly in this regard, even a slightly older version of Mac OS can’t do a restore. It is however nice that most of the TimeMachine data is a tree of files which could be just copied to another system.

(4/5) Backup Software should not be Dropped

Sony Ericsson has made me hate them even more by putting the following message on their update web site:

The Backup and Restore app will be overwritten and cannot be used to restore data. Check out Android Market for alternative apps to back up and restore your data, such as MyBackup.

So if you own a Sony Ericsson phone and it is lost, stolen, or completely destroyed and all you have is a backup made by the Sony Ericsson tool then the one thing you absolutely can’t do is to buy a new Sony Ericsson phone to restore the data.

I believe that anyone who releases backup software has an ethical obligation to support restoring to all equivalent systems. How difficult would it be to put a new free app in the Google Market that has as it’s sole purpose recovering old Sony Ericsson backups onto newer phones? It really can’t be that difficult, so even if they don’t want to waste critical ROM space by putting the feature in all new phones they can make it available to everyone who needs it. When compared to the cost of developing a new Android release for a series of phones the cost of writing such a restore program would be almost nothing.

It is simply mind-boggling that Sony Ericsson go against their own commercial interests in this regard. Surely it would make good business sense to be able to sell replacements for all the lost and broken Sony Ericsson phones, but instead customers who get burned by broken backups are given an incentive to buy a product from any other vendor.

(5/5) The greater the control over data the greater the obligation for protecting it

If you have data stored in a simple and standard manner (EG the /DCIM directory containing MP4 and JPEG files that is on the USB accessible storage in every modern phone) then IMHO it’s quite OK to leave customers to their own devices in terms of backups. Typical users can work out that if they don’t backup their pictures then they risk losing them, and they can work out how to do it.

My Sony Ericsson phones have data stored under /data (settings for Android applications) which is apparently only accessible as root. Sony Ericsson have denied me root access which prevents me running backup programs such as Titanium Backup, therefore I believe that they have a great obligation to provide a way of making a backup of this data and restoring it on a new phone or a phone that has been updated. To just provide phone upgrade instructions which tell me that my phone will be entirely wiped and that I should search the App Market for backup programs is unacceptable.

I believe that there are two ethical options available to Sony Ericsson at this time, one is to make it easy to root phones so that Titanium Backup and similar programs can be used, and the other option is to release a suitable backup program for older phones. Based on experience I don’t expect Sony Ericsson to choose either option.

Now it is also a bad thing for the Android application developers to make it difficult or impossible to backup their data. For example the Wiki for one Android game gives instructions for moving the saved game files to a new phone which starts with “root your phone”. The developers of that game should have read the Wiki, realised that rooting a phone for the mundane task of transferring saved game files is totally unreasonable, and developed a better alternative.

The best thing for developers to do is to allow the users to access their own data in the most convenient manner. Then it becomes the user’s responsibility to manage it and they can concentrate on improving their application.

Why Freedom is Important

Installing CyanogenMod on my Galaxy S was painful, but having root access so I can do anything I want is a great benefit. If phone vendors would do the right thing then I could recommend that other people use the vendor release, but it seems that vendors can be expected to act unethically. So I can’t recommend that anyone use an un-modded Android phone at any time. I also can’t recommend ever buying a Sony Ericsson product, not even when it’s really cheap.

Google have done a great thing with their Data Liberation Front [1]. Not only are they providing access to the data they store on our behalf (which is a good thing) but they have a mission statement that demands the same behavior from other companies – they make it an issue of competitive advantage! So while Sony Ericsson and other companies might not see a benefit in making people like me stop hating them, failing to be as effective in marketing as Google is a real issue. Data Liberation is something that should be discussed at board elections of IT companies.

Keep in mind the fact that ethics are not just about doing nice things, they are about establishing expectations of conduct that will be used by people who deal with you in future. Sony Ericsson has shown that I should expect that they will treat the integrity of my data with contempt and I will keep this in mind every time I decline an opportunity to purchase their products. Google has shown that they consider the protection of my data as an important issue and therefore I can be confident when using and recommending their services that I won’t get stuck with data that is locked away.

While Google has demonstrated that corporations can do the right thing, the vast majority of evidence suggests that we should never trust a corporation with anything that we might want to retrieve when it’s not immediately profitable for the corporation. Therefore avoiding commercial services for storing important data is the sensible thing to do.

4 comments to 5 Principles of Backup Software

  • neonsignal

    On point three I agree strongly, it pays to be very suspicious of backup systems that are not forward compatible with new versions. If they can’t be bothered with defining an extendable format that is forward compatible, then there is a fair chance that at some point in the future they’ll break their format enough to not be backward compatible either. No-one wants to be digging around for ancient and unmaintained software just to examine a single file in a historical backup.

    I’m finding more and more that plain old filesystems (with hard links to manage incrementals) are the safest for backups (eg using rsync and the like). If the files are small (eg source), I don’t care if they get compressed, and if they are large (eg media), then they are probably already compressed. And the beauty is that in the worst case it can be restored with plain old cp. The drawback is that it doesn’t work so well for “embedded” filesystems, such as databases and virtual machine images, since their contents are to some extent opaque (eg, as you suggest, something like an XML export needs to be part of the process).

    Another point I’d want to add to backup software requirements is the need for good logging and testing facilities; it is vital to know that a backup has had errors, and to be able to test that the backup can be restored without necessarily restoring the entire content.

  • Brendan

    I once tried to recover a tar file, but was told (by tar) that the format was too old. The tar was made in 1991 ish, and I think I was trying to recover it 10-12 years or so later. I don’t know if the archive was corrupted.

  • Brendan

    PS: on Android an app’s data is private to that app (including if it’s connected for debugging). The app must, itself, have a function for exporting it to a public area on the SD card and a second function for importing them again.

    Maybe what needed to be backed up was too tedious for them to be bothered coding it in Java?

  • etbe

    neonsignal: Yes filesystems can be good, but they can also become outdated. It’s fortunate that my backups of early Linux systems weren’t in the Minix or Xiafs filesystem formats. It’s unfortunate that some of my backups of OS/2 systems are in the form of HPFS images…

    Brendan: In at least one case the app developer might consider that they have a vested interest in making it difficult to backup data – to avoid cheating in an online game.

    http://www.reddit.com/r/linux/comments/pgkea/5_principles_of_backup_software/

    A comment on Reddit says “How is it ethical to deny root to permanent users of* devices”. I agree that when we consider the ethical issues related to using devices then a good case can be made for allowing the purchaser to have root access. But in terms of backup it’s sufficient to merely preserve the user’s data – something Sony Ericsson shows no sign of ever doing.