Archives

Categories

Using BTRFS

I’ve just installed BTRFS on some systems that matter to me. It is still regarded as experimental but Oracle supports it with their kernel so it can’t be too bad – and it’s almost guaranteed that anything other than BTRFS or ZFS will lose data if you run as many systems as I do. Also I run lots of systems that don’t have enough RAM for ZFS (4G isn’t enough for ZFS in my tests). So I have to use BTRFS.

BTRFS and Virtual Machines

I’m running BTRFS for the DomUs on a virtual server which has 4G of RAM (and thus can’t run ZFS). The way I have done this is to use ext4 on Linux software RAID-1 for the root filesystem on the Dom0 and use BTRFS for the rest. For BTRFS and virtual machines there seem to be two good options given that I want BTRFS to use it’s own RAID-1 so that it can correct errors from a corrupted disk. One is to use a single BTRFS filesystem with RAID-1 for all the storage and then have each VM use a file on that big BTRFS filesystem for all it’s storage. The other option is to have each virtual machine run BTRFS RAID-1.

I’ve created two LVM Volume Groups (VGs) named diska and diskb, each DomU has a Logical Volume (LV) from each VG and runs BTRFS. So if a disk becomes corrupt the DomU will have to figure out what the problem is and fix it.

#!/bin/bash
for n in $(xm list|cut -f1 -d\ |egrep -v ^Name\|^Domain-0) ; do
  echo $n
  ssh $n "btrfs scrub start -B -d /"
done

I use the above script in a cron job from the Dom0 to scrub the BTRFS filesystems in the DomUs. I use the -B option so that I will receive email about any errors and so that there won’t be multiple DomUs scrubbing at the same time (which would be really bad for performance).

BTRFS and Workstations

The first workstation installs of BTRFS that I did were similar to installations of Ext3/4 in that I had multiple filesystems on LVM block devices. This caused all the usual problems of filesystem sizes and also significantly hurt performance (sync seems to perform very badly on a BTRFS filesystem and it gets really bad with lots of BTRFS filesystems). BTRFS allows using subvolumes for snapshots and it’s designed to handle large filesystems so there’s no reason to have more than one filesystem IMHO.

It seems to me that the only benefit in using multiple BTRFS filesystems on a system is if you want to use different RAID options. I presume that eventually the BTRFS developers will support different RAID options on a per-subvolume basis (they seem to want to copy all ZFS features). I would like to be able to configure /home to use 3 copies of all data and metadata on a workstation that only has a single disk.

Currently I have some workstations using BTRFS with a BTRFS RAID-1 configuration for /home and a regular non-RAID configuration for everything else. But now it seems that this is a bad idea, I would be better off just using a single copy of all data on workstations (as I did for everything on workstations for the previous 15 years of running Linux desktop systems) and make backups that are frequent enough to not have a great risk.

BTRFS and Servers

One server that I run is primarily used as an NFS server and as a workstation. I have a pair of 3TB SATA disks in a BTRFS RAID-1 configuration mounted as /big and with subvolumes under /big for the various NFS exports. The system also has a 120G Intel SSD for /boot (Ext4) and the root filesystem which is BTRFS and also includes /home. The SSD gives really good read performance which is largely independent of what is done with the disks so booting and workstation use are very fast even when cron jobs are hitting the file server hard.

The system has used a RAID-1 array of 1TB SATA disks for all it’s storage ever since 1TB disks were big. So moving to a single storage device for /home is a decrease in theoretical reliability (in addition to the fact that a SSD might be less reliable than a traditional disk). The next thing that I am going to do is to install cron jobs that backup the root filesystem to something under /big. The server in question isn’t used for anything that requires high uptime, so if the SSD dies entirely and I need to replace it with another boot device then it will be really annoying but it won’t be a great problem.

Snapshot Backups

One of the most important uses of backups is to recover from basic user mistakes such as deleting the wrong file. To deal with this I wrote some scripts to create backups from a cron job. I put the snapshots of a subvolume under a subvolume named “backup“. A common use is to have everything on the root filesystem, /home as a subvolume, /home/backup as another subvolume, and then subvolumes for backups such as /home/backup/2012-12-17, /home/backup/2012-12-17:00:15, and /home/backup/2012-12-17:00:30. I make /home/backup world readable so every user can access their own backups without involving me, of course this means that if they make a mistake related to security then I would have to help them correct it – but I don’t expect my users to deal with security issues, if they accidentally grant inappropriate access to their files then I will be the one to notice and correct it.

Here is a script I name btrfs-make-snapshot which has an optional first parameter “-d” to cause it do just display the btrfs commands it would run and not actually do anything. The second parameter is either “minutes” or “days” depending on whether you want to create a snapshot on a short interval (I use 15 minutes) or a daily snapshot. All other parameters are paths for subvolumes that are to be backed up:

#!/bin/bash
set -e

# usage:
# btrfs-make-snapshot [-d] minutes|days paths
# example:
# btrfs-make-snapshot minutes /home /mail

if [ "$1" == "-d" ]; then
  BTRFS="echo btrfs"
  shift
else
  BTRFS=/sbin/btrfs
fi

if [ "$1" == "minutes" ]; then
  DATE=$(date +%Y-%m-%d:%H:%M)
else
  DATE=$(date +%Y-%m-%d)
fi
shift

for n in $* ; do
  $BTRFS subvol snapshot -r $n $n/backup/$DATE
done

Here is a script I name btrfs-remove-snapshots which removes old snapshots to free space. It has an optional first parameter “-d” to cause it do just display the btrfs commands it would run and not actually do anything. The next parameters are the number of minute based and day based snapshots to keep (I am currently experimenting with 100 100 for /home to keep 15 minute snapshots for 25 hours and daily snapshots for 100 days). After that is a list of filesystems to remove snapshots from. The removal will be from under the backup subvolume of the path in question.

#!/bin/bash
set -e

# usage:
# btrfs-remove-snapshots [-d] MINSNAPS DAYSNAPS paths
# example:
# btrfs-remove-snapshots 100 100 /home /mail

if [ "$1" == "-d" ]; then
  BTRFS="echo btrfs"
  shift
else
  BTRFS=/sbin/btrfs
fi

MINSNAPS=$1
shift
DAYSNAPS=$1
shift

for DIR in $* ; do
  BASE=$(echo $DIR | cut -c 2-200)
  for n in $(btrfs subvol list $DIR|grep $BASE/backup/.*:|head -n -$MINSNAPS|sed -e "s/^.* //"); do
    $BTRFS subvol delete /$n
  done
  for n in $(btrfs subvol list $DIR|grep $BASE/backup/|grep -v :|head -n -$MINSNAPS|sed -e "s/^.* //"); do
    $BTRFS subvol delete /$n
  done
done

A Warning

The Debian/Wheezy kernel (based on the upstream kernel 3.2.32) doesn’t seem to cope well when you run out of space by making snapshots. I have a filesystem that I am still trying to recover after doing that.

I’ve just been buying larger storage devices for systems while migrating them to BTRFS, so I should be able to avoid running out of disk space again until I can upgrade to a kernel that fixes such bugs.

2 comments to Using BTRFS

  • Martin Leben

    You decided to use BTRFS in the DomU:s rather than in the Dom0. I’d like to know why.

    By the way, I read most of your blog posts with great interest. Thank you!

  • etbe

    Martin: If I ran BTRFS in the Dom0 then each DomU would need to have it’s own filesystem using a file on the Dom0 filesystem as the block device. That means you have a filesystem within a filesystem which can’t be good for performance.

    The way I do it is a bit hacky and has some down-sides, I may use the filesystem in a filesystem approach in future.

    I’m glad you like my posts.