Archives

Categories

BTRFS Status April 2014

Since my blog post about BTRFS in March [1] not much has changed for me. Until yesterday I was using 3.13 kernels on all my systems and dealing with the occasional kmail index file corruption problem.

Yesterday my main workstation ran out of disk space and went read-only. I started a BTRFS balance which didn’t seem to be doing any good because most of the space was actually in use so I deleted a bunch of snapshots. Then my X session aborted (some problem with KDE or the X server – I’ll never know as logs couldn’t be written to disk). I rebooted the system and had kernel threads go into infinite loops with repeated messages about a lack of response for 22 seconds (I should have photographed the screen). When it got into that state the ALT-Fn keys to change a virtual console sometimes worked but nothing else worked – the terminal usually didn’t respond to input.

To try and stop the kernel from entering an infinite loop on every boot that I used “rootflags=skip_balance” on the kernel command line to stop it from continuing the balance which made the system usable for a little longer, unfortunately the skip_balance mount option doesn’t permanently apply, the kernel will keep trying to balance the filesystem on every mount until a “btrfs balance cancel” operation succeeds. But my attempts to cancel the balance always failed.

When I booted my system with skip_balance it would sometimes free some space from the deleted snapshots, after two good runs I got to 17G free. But after that every time I rebooted it would report another Gig or two free (according to “btrfs filesystem df“) and then hang without committing the changes to disk.

I solved this problem by upgrading my USB rescue image to kernel 3.14 from Debian/Experimental and mounting the filesystem from the rescue image. After letting kernel 3.14 work on the filesystem for a while it was in a stage where I could use it with kernel 3.13 and then boot the system normally to upgrade it to kernel 3.14.

I had a minor extra complication due to the fact that I was running “apt-get dist-upgrade” at the time the filesystem went read-only do the dpkg records of which packages were installed were a bit messed up. But that was easy to fix by running a diff against /var/lib/dpkg/info on a recent snapshot. In retrospect I should have copied from an old snapshot of the root filesystem, but I fixed the problems faster than I could think of better ways to fix them.

When running a balance the system had a peak IO rate of about 30MB/s reads and 30MB/s writes. That compares to the maximum contiguous file IO speed of 260MB/s for reads and 320MB/s for writes. During that time it had about 50% CPU time used for my Q8400 quad-core CPU. So far the only tasks that I do regularly which have CPU speed as a significant bottleneck are BTRFS filesystem balancing and recoding MP4 files. Compiling hasn’t been an issue because recently I haven’t been compiling many programs that are particularly big.

Lessons Learned

I should photograph the screen regularly when doing things that won’t be logged, those kernel error messages might have been useful to me or someone else.

The fact that the only kernel that runs BTRFS the way I need comes from the Experimental repository in Debian stands in contrast to the recent kernel patch that stops describing BTRFS as experimental. While I have a high opinion of the people who provide support for the kernel in commercial distributions and their ability to back-port fixes from newer kernels I’m concerned about their decision to support BTRFS. I’m also dubious about whether we can offer BTRFS support in Debian/Jessie (the next version of Debian) without a significant warning. OTOH if you find yourself with a BTRFS system that isn’t working well you could always hire me to fix it. I accept payment via Paypal, bank transfer, or Bitcoin. If you want to pay me in Grange then I assure you I will never forget about it. ;)

I thought that I wouldn’t have CPU speed issues when I started using the AMD64 architecture, for most tasks that’s been the case. But for systems for which storage is important I’ll look at getting faster CPUs because of BTRFS. Using faster CPUs for storage isn’t that uncommon (I used to work for SGI and dealt with some significant CPU power used for file serving), but needing a fast quad-core CPU to drive a single SSD is a little disappointing. While recovery from file system corner cases isn’t going to be particularly common it’s something that you want completed quickly, for personal systems you want to be doing something else and for work systems you don’t want down-time.

The BTRFS problems with running out of disk space are really serious. It seems that even workstations used at home can’t survive without monitoring. For any other filesystem used at home you can just let it get full and then delete stuff.

Include “rootflags=skip_balance” in the boot loader configuration for every system with a BTRFS root filesystem and in the /etc/fstab for every non-root BTRFS filesystem. I haven’t yet encountered a single situation where continuing the balance did any good or when it didn’t do any harm.

6 comments to BTRFS Status April 2014

  • Roger Leigh

    Until a Btrfs filesytem can no longer be reduced to a state of unusability by getting unbalanced, it’s simply not suitable for any serious use.

  • etbe

    If you have a filesystem that’s significantly larger than the space required then it can work reasonably well.

    But I agree that for most people it’s not something that they want to use yet. It will be interesting to see what happens with the commercial distributions, their users seem to have a higher expectation of things working and a lower ability to fix them when they don’t.

  • Blissex

    The impression I get from your experiences, mine, and those of others is that BTRFS is a reliable production ready file-system, as long as it is used as just a file-system.

    It is the volume-management functionality that needs to be cleaned up, as there are several annoying corner cases.

    Other file-systems like ‘ext3’ also have had infrequently occurring but very serious issues for many many years and most people just put up with them…

  • Roger Leigh

    etbe, my experience is that it can fail horribly even when there *is* significant free space. A few weeks ago, I did several whole-archive rebuilds using sbuild, with schroot btrfs-snapshot chroots. I used a pristine Btrfs volume of several hundred gigabytes for the scratch space. Completely empty. During the builds, space utilisation was around 2%, peaking at ~10% transiently for exceptionally big packages. So the free space was 98% for the most of the time, falling to 90% on occasion. And yet, Btrfs still got completely unbalanced every ~48 hours and ceased to function. I was shocked firstly that this could happen, and secondly that it required manual intervention to restore to a working state.

    I have never, ever, encountered a filesystem which could completely ruin itself to the point of complete unusability simply through intensive use. I’ve had ext2/3 filesystems survive nearly a decade with zero care or attention despite thrashing them. They are solid filesystems. Even FAT might become fragmented and slow, but even then will continue to function. This just fails.

    I can’t conceive how this can, in its current state, be production-worthy for *any* purpose if it will at some indeterminate point simply cease to function, and there’s no way to tell when that will be. Even if you rebalance e.g. daily, you might still have a busy day where it’s pushed past the failure point.

    Why it even gets to this state in the first place seems stupid. Why allocate all of one extent type and exhaust the free extents? In my scenario, free space was *never an issue*. It seemingly didn’t reallocate previously used extents, but allocated new ones until it had wasted over 200GB of free space. There was never a need for it to get into this state.

    I can accept that I’m pushing it harder than most people would. 8 parallel builds with a continuous read/write load and continous creation/deletion of several thousand snapshots per day is atypical. However, this is only decreasing the failure time. If it’s 48 hours from creation to failure for me, what’s the lifetime for a casual user? Or a busy server? And how does that interact with the real space utilisation? This uncertainty simply isn’t acceptable for me.

    I ran btrfs on my rootfs for a couple of years, but I switched back to ext4. I’ve had unrecoverable data loss with it due to bugs in the raid code (transient SATA glitch due to loose cable -> both the failed and mirror drive unrecoverable, and both panicked the kernel when I tried to mount them). Thankfully I had a backup. I’ve had a couple of other incidents as well, but that’s the worst. I’ve since lost nearly all my faith in it. It’s simply not ready yet.

    Regards,
    Roger

  • Could you start tagging those posts with BTRFS, please?

    I really appreciate these posts and will likely revisit them several times in the coming years.

    I _want_ BTRFS to work and succeed, but your experience is so negative that it will be years until I trust even one copy of my data with it.

    Thanks for taking the burden of both testing and blogging!

  • etbe

    http://etbe.coker.com.au/tag/btrfs/

    Done.

    Also please note that in software development things can change quite rapidly. BTRFS works significantly better for me now than it did a few weeks ago.