I’ve just had to do yet another backup/format/restore operation on my workstation due to a BTRFS corruption problem, but as usual I didn’t lose any data. The BTRFS data integrity features work reasonably well even when the filesystem gets into a state where the kernel will only accept a read-only mount.
Given that the BTRFS tag on my blog is mostly about problems with BTRFS I think it’s time that I explain why I use it in spite of the problems before people start to worry about my sanity or competence.
The first thing to note is that BTRFS is fairly resiliant toward errors when mounted in read-only mode. When mounting a filesystem read-write there are a number of ways in which things can break which are often due to kernel code not being able to handle corrupt metadata – I don’t know how much of this is inherent to the design of BTRFS and how much is simply missing features in filesystem error handling. Some of the errors that I have had weren’t entirely the fault of BTRFS, I twice had to do a backup/format/restore of my workstation due to a faulty DIMM corrupting memory (that has the potential to mess up any filesystem), but I still didn’t lose any data AFAIK.
The next thing to note is that I don’t use BTRFS when doing paid sysadmin work. ZFS is a solid and reliable filesystem and it is working really well for my clients while BTRFS has too many issues at the moment. As an aside I’m not interested in any comments about the ZFS license situation from anyone who’s not officially representing Oracle.
I also don’t use BTRFS on systems that I can’t access easily. The servers I have running BTRFS are all within an hour’s drive from home, while driving for an hour on account of a kernel or filesystem error is really annoying it’s not as bad as dealing with a remote server where I have no direct access.
Reasons to Use BTRFS
The benefits of BTRFS right now are snapshots (which are good for a first-line backup) and the basic data integrity features. I’ve found these features to work well in real use.
According to the comparison of Filesystems Wikipedia page ZFS and BTRFS are the only general purpose filesystems (IE for disks not tapes, NVRAM, or a cluster) that support checksums for all data and compression. Given that ZFS license issues will never allow it to be included in the Linux kernel tree it seems clear that BTRFS is the next significant filesystem for Linux. More testing of BTRFS is a good thing, while there are a number of known problems that the developers are working on it seems that more testing is needed now to find corner cases. Also we need a lot of testing to find bugs related to interactions with other software.
I’ve recently filed bug reports against the Debian installer because it can’t install to a BTRFS RAID-1 (fortunately BTRFS supports changing to RAID-1 after installation) and because it doesn’t support formatting an existing BTRFS filesystem (the mkfs program needs a -f option in that case). I also sent in a patch for the magic database used by file(1) to provide more information on BTRFS filesystems (which is in Debian/testing but not Debian/Wheezy). These are the sorts of things you encounter when routinely using software that you don’t necessarily notice in basic testing.
As an aside the Debian installation process failed at the GRUB step when I manually balanced a filesystem to use RAID-1 while the Debian installation was in progress. I didn’t file a bug report because the best advice is to not mess with filesystems while the installer is running. I’ll do a lot more testing of this when the Debian installer supports a BTRFS RAID-1 installation.
A final thing that we need to work on is developing sysadmin best practices and scripts for managing BTRFS filesystems. I’ve done some work on scripts to create snapshots for online backups but there are issues of managing free space etc. But working out how to best manage a new filesystem is something that takes years because there are many corner cases you may only encounter after a system has been running for a long time. So I really wouldn’t want to be in the situation of using a new filesystem on an important server without having practice running it on less important systems, I did that with ZFS and now have a hacky first install that I have to support for years.