How I Partition Disks

Having had a number of hard drives fail over the years I use RAID whenever possible to reduce the probability of data loss caused by hardware failure. It’s unfortunate that some machines make it impractically difficult to install a second hard drive (my laptop and some small form factor desktop machines I have given to relatives). But when it’s possible I have two hard drives running software RAID-1.

I use two partitions, one for /boot and one as a LVM physical volume (PV). When using RAID I make both /boot and the PV be software RAID-1 devices (the vast majority of machines that I install don’t have budget available for hardware RAID). /boot is a small partition, approximately 100M. For a machine with only one disk I make the second partition take all the available space as there is no benefit in doing otherwise (LVM allows me to create an arbitrary number of block devices out of the space at run-time).

When using software RAID I often make the PV take less than the full capacity of the disks. When the disks are 40G or more I usually use less than half the capacity. For most machines that I install or run the full capacity of the disks is not required. One deficiency of Linux software RAID is that on a power failure the entire RAID device must be checked to ensure that all disks have matching data. Reading the entire contents of a pair of disks can take a large amount of time if the disks are large, and as the size of disks is increasing at a greater rate than the speed of disks this problem is getting worse. See my ZCAV benchmark results page for graphs of the contiguous IO performance of some disks I’ve owned [1]. It seems that with a 10G disk you may expect to be able to read it all in less than 1000 seconds, for a 46G disk it’ll be something greater than 1,500 seconds, and for 300G disks you are looking at something well in excess of 5,000 seconds.

Almost all disks (and all IDE and SATA disks for which I’ve seen benchmark results) have the lower block numbers mapped to the outside tracks which are longer and give a higher contiguous IO speed. So by using the first half of the disk the RAID synchronisation time is reduced to less than half what it might be (in the absence of motherboard bottlenecks).

When there is no need for more than about 10G of storage space there’s no benefit in making a large RAID device and wasting recovery time. While the system can operate while the RAID is synchronising the performance will be significantly lower than normal.

If the usage pattern of the machine changes such that it needs more space it’s easy to create new partitions, make a new software RAID, and then add it to the LVM volume group (I’ve done this before). So the down-side to this is minimal.

When creating LVM logical volumes (LVs) I create volumes for the root filesystem and swap space when doing the install. This should result in the swap space being near enough to the start of the disk to get reasonable performance (but I haven’t verified this). I make the root filesystem reasonably large (EG 6G) as disk space is plentiful and cheap nowadays and the root filesystem is the only one that you can’t easily extend when running LVM (trying to do so deadlocks the disk IO). After the base install is done I create other LVs.

4 comments to How I Partition Disks

  • n

    you wouldn’t have to wait that long after a power loss (or waste any disk space by reducing your usage) if you used write-intent bitmaps…

  • Next time you’re installing a machine, look at the bitmap options for mdadm. In particular, creating your RAID with something like “mdadm –create /dev/md0 –raid-devices=2 –level=1 –bitmap=internal /dev/disk0 /dev/disk1” will create an internal bitmap, which allows mdadm to only check recently changed bits of disk on a dirty reboot. If you’ve already got devices, “mdadm –grow /dev/md0 –bitmap=internal” will add a bitmap to an array.

    For details on what this does, look at the “BITMAP WRITE-INTENT LOGGING” section of md(4, but briefly, it causes md to log which bits of disk are being written to in an on-disk bitmap, so that when you have a restart, only those chunks need checking.

  • etbe

    Thanks for the advice. I’ve started a discussion on getting this support in Debian. The problem is that it’s extremely difficult to do this in Debian as the installer doesn’t support it.

    I’m considering adjusting my partitioning plans to have / (including /boot) on a separate RAID-1 device to the LVM stuff. Then I can create the RAID-1 for LVM after the main install and use -binternal.

  • […] Comments etbe on Organic Food in Melbourneetbe on How I Partition DisksChris Samuel on Organic Food in Melbourneetbe on Storing a GPG keyetbe on Better Social […]