BTRFS Training

0 Comment

Some years ago Barwon South Water gave LUV 3 old 1RU Sun servers for any use related to free software. We gave one of those servers to the Canberra makerlab and another is used as the server for the LUV mailing lists and web site and the 3rd server was put aside for training. The servers have hot-swap 15,000rpm SAS disks – IE disks that have a replacement cost greater than the budget we have for hardware. As we were given a spare 70G disk (and a 140G disk can replace a 70G disk) the LUV server has 2*70G disks and the 140G disks (which can’t be replaced) are in the server for training.

On Saturday I ran a BTRFS and ZFS training session for the LUV Beginners’ SIG. This was inspired by the amount of discussion of those filesystems on the mailing list and the amount of interest when we have lectures on those topics.

The training went well, the meeting was better attended than most Beginners’ SIG meetings and the people who attended it seemed to enjoy it. One thing that I will do better in future is clearly documenting commands that are expected to fail and documenting how to login to the system. The users all logged in to accounts on a Xen server and then ssh’d to root at their DomU. I think that it would have saved a bit of time if I had aliased commands like “btrfs” to “echo you must login to your virtual server first” or made the shell prompt at the Dom0 include instructions to login to the DomU.

Each user or group had a virtual machine. The server has 32G of RAM and I ran 14 virtual servers that each had 2G of RAM. In retrospect I should have configured fewer servers and asked people to work in groups, that would allow more RAM for each virtual server and also more RAM for the Dom0. The Dom0 was running a BTRFS RAID-1 filesystem and each virtual machine had a snapshot of the block devices from my master image for the training. Performance was quite good initially as the OS image was shared and fit into cache. But when many users were corrupting and scrubbing filesystems performance became very poor. The disks performed well (sustaining over 100 writes per second) but that’s not much when shared between 14 active users.

The ZFS part of the tutorial was based on RAID-Z (I didn’t use RAID-5/6 in BTRFS because it’s not ready to use and didn’t use RAID-1 in ZFS because most people want RAID-Z). Each user had 5*4G virtual disks (2 for the OS and 3 for BTRFS and ZFS testing). By the end of the training session there was about 76G of storage used in the filesystem (including the space used by the OS for the Dom0), so each user had something like 5G of unique data.

We are now considering what other training we can run on that server. I’m thinking of running training on DNS and email. Suggestions for other topics would be appreciated. For training that’s not disk intensive we could run many more than 14 virtual machines, 60 or more should be possible.

Below are the notes from the BTRFS part of the training, anyone could do this on their own if they substitute 2 empty partitions for /dev/xvdd and /dev/xvde. On a Debian/Jessie system all that you need to do to get ready for this is to install the btrfs-tools package. Note that this does have some risk if you make a typo. An advantage of doing this sort of thing in a virtual machine is that there’s no possibility of breaking things that matter.

  1. Making the filesystem
    1. Make the filesystem, this makes a filesystem that spans 2 devices (note you must use the-f option if there was already a filesystem on those devices):
      mkfs.btrfs /dev/xvdd /dev/xvde
    2. Use file(1) to see basic data from the superblocks:
      file -s /dev/xvdd /dev/xvde
    3. Mount the filesystem (can mount either block device, the kernel knows they belong together):
      mount /dev/xvdd /mnt/tmp
    4. See a BTRFS df of the filesystem, shows what type of RAID is used:
      btrfs filesystem df /mnt/tmp
    5. See more information about FS device use:
      btrfs filesystem show /mnt/tmp
    6. Balance the filesystem to change it to RAID-1 and verify the change, note that some parts of the filesystem were single and RAID-0 before this change):
      btrfs balance start -dconvert=raid1 -mconvert=raid1 -sconvert=raid1 –force /mnt/tmp
      btrfs filesystem df /mnt/tmp
    7. See if there are any errors, shouldn’t be any (yet):
      btrfs device stats /mnt/tmp
    8. Copy some files to the filesystem:
      cp -r /usr /mnt/tmp
    9. Check the filesystem for basic consistency (only checks checksums):
      btrfs scrub start -B -d /mnt/tmp
  2. Online corruption
    1. Corrupt the filesystem:
      dd if=/dev/zero of=/dev/xvdd bs=1024k count=2000 seek=50
    2. Scrub again, should give a warning about errors:
      btrfs scrub start -B /mnt/tmp
    3. Check error count:
      btrfs device stats /mnt/tmp
    4. Corrupt it again:
      dd if=/dev/zero of=/dev/xvdd bs=1024k count=2000 seek=50
    5. Unmount it:
      umount /mnt/tmp
    6. In another terminal follow the kernel log:
      tail -f /var/log/kern.log
    7. Mount it again and observe it correcting errors on mount:
      mount /dev/xvdd /mnt/tmp
    8. Run a diff, observe kernel error messages and observe that diff reports no file differences:
      diff -ru /usr /mnt/tmp/usr/
    9. Run another scrub, this will probably correct some errors which weren’t discovered by diff:
      btrfs scrub start -B -d /mnt/tmp
  3. Offline corruption
    1. Umount the filesystem, corrupt the start, then try mounting it again which will fail because the superblocks were wiped:
      umount /mnt/tmp
      dd if=/dev/zero of=/dev/xvdd bs=1024k count=200
      mount /dev/xvdd /mnt/tmp
      mount /dev/xvde /mnt/tmp
    2. Note that the filesystem was not mountable due to a lack of a superblock. It might be possible to recover from this but that’s more advanced so we will restore the RAID.
      Mount the filesystem in a degraded RAID mode, this allows full operation.
      mount /dev/xvde /mnt/tmp -o degraded
    3. Add /dev/xvdd back to the RAID:
      btrfs device add /dev/xvdd /mnt/tmp
    4. Show the filesystem devices, observe that xvdd is listed twice, the missing device and the one that was just added:
      btrfs filesystem show /mnt/tmp
    5. Remove the missing device and observe the change:
      btrfs device delete missing /mnt/tmp
      btrfs filesystem show /mnt/tmp
    6. Balance the filesystem, not sure this is necessary but it’s good practice to do it when in doubt:
      btrfs balance start /mnt/tmp
    7. Umount and mount it, note that the degraded option is not needed:
      umount /mnt/tmp
      mount /dev/xvdd /mnt/tmp
  4. Experiment
    1. Experiment with the “btrfs subvolume create” and “btrfs subvolume delete” commands (which act like mkdir and rmdir).
    2. Experiment with “btrfs subvolume snapshot SOURCE DEST” and “btrfs subvolume snapshot -r SOURCE DEST” for creating regular and read-only snapshots of other subvolumes (including the root).