Linux, politics, and other interesting things
As storage capacities increase the probability of data corruption increases as does the amount of time required for a fsck on a traditional filesystem. Also the capacity of disks is increasing a lot faster than the contiguous IO speed which means that the RAID rebuild time is increasing, for example my first hard disk was 70M and had a transfer rate of 500K/s which meant that the entire contents could be read in a mere 140 seconds! The last time I did a test on a more recent disk a 1TB SATA disk gave contiguous transfer rates ranging from 112MB/s to 52MB/s which meant that reading the entire contents took 3 hours and 10 minutes, and that problem is worse with newer bigger disks. The long rebuild times make greater redundancy more desirable.
Both BTRFS and ZFS checksum all data to cover the case where a disk returns corrupt data, they don’t need a fsck program, and the combination of checksums and built-in RAID means that they should have less risk of data loss due to a second failure during rebuild. ZFS supports RAID-Z which is essentially a RAID-5 with checksums on all blocks to handle the case of corrupt data as well as RAID-Z2 which is a similar equivalent to RAID-6. RAID-Z is quite important if you don’t want to have half your disk space taken up by redundancy or if you want to have your data survive the loss or more than one disk, so until BTRFS has an equivalent feature ZFS offers significant benefits. Also BTRFS is still rather new which is a concern for software that is critical to data integrity.
I am about to install a system to be a file server and Xen server which probably isn’t going to be upgraded a lot over the next few years. It will have 4 disks so ZFS with RAID-Z offers a significant benefit over BTRFS for capacity and RAID-Z2 offers a significant benefit for redundancy. As it won’t be upgraded a lot I’ll start with Debian/Wheezy even though it isn’t released yet because the system will be in use without much change well after Squeeze security updates end.
Getting ZFS to basically work isn’t particularly hard, the ZFSonLinux.org site has the code and reasonable instructions for doing it . The zfsonlinux code doesn’t compile out of the box on Wheezy although it works well on Squeeze. I found it easier to get a the latest Ubuntu working with ZFS and then I rebuilt the Ubuntu packages for Debian/Wheezy and they worked. This wasn’t particularly difficult but it’s a pity that the zfsonlinux site didn’t support recent kernels.
The complication with root on ZFS is that the ZFS FAQ recommends using whole disks for best performance so you can avoid alignment problems on 4K sector disks (which is an issue for any disk large enough that you want to use it with ZFS) . This means you have to either use /boot on ZFS (which seems a little too experimental for me) or have a separate boot device.
Currently I have one server running with 4*3TB disks in a RAID-Z array and a single smaller disk for the root filesystem. Having a fifth disk attached by duct-tape to a system that is only designed for four disks isn’t ideal, but when you have an OS image that is backed up (and not so important) and a data store that’s business critical (but not needed every day) then a failure on the root device can be fixed the next day without serious problems. But I want to fix this and avoid creating more systems like it.
There is some good documentation on using Ubuntu with root on ZFS . I considered using Ubuntu LTS for the server in question, but as I prefer Debian and I can recompile Ubuntu packages for Debian it seems that Debian is the best choice for me. I compiled those packages for Wheezy, did the install and DKMS build, and got ZFS basically working without much effort.
The problem then became getting ZFS to work for the root filesystem. The Ubuntu packages didn’t work with the Debian initramfs for some reason and modules failed to load. This wasn’t necessarily a show-stopper as I can modify such things myself, but it’s another painful thing to manage and another way that the system can potentially break on upgrade.
The next issue is the unusual way that ZFS mounts filesystems. Instead of having block devices to mount and entries in /etc/fstab the ZFS system does things for you. So if you want a ZFS volume to be mounted as root you configure the mountpoint via the “zfs set mountpoint” command. This of course means that it doesn’t get mounted if you boot with a different root filesystem and adds some needless pain to the process. When I encountered this I decided that root on ZFS isn’t a good option. So for this new server I’ll install it with an Ext4 filesystem on a RAID-1 device for root and /boot and use ZFS for everything else.
After setting up the system with a 4 disk RAID-1 (or mirror for the pedants who insist that true RAID-1 has only two disks) for root and boot I then created partitions for ZFS. According to fdisk output the partitions /dev/sda2, /dev/sdb2 etc had their first sector address as a multiple of 2048 which I presume addresses the alignment requirement for a disk that has 4K sectors.
deb http://www.coker.com.au wheezy zfs
I created the above APT repository (only AMD64) for ZFS packages based on Darik Horn’s Ubuntu packages (thanks for the good work Darik). Installing zfs-dkms, spl-dkms, and zfsutils gave a working ZFS system. I could probably have used Darik’s binary packages but I think it’s best to rebuild Ubuntu packages to use on Debian.
The server in question hasn’t gone live in production yet (it turns out that we don’t have agreement on what the server will do). But so far it seems to be working OK.