6

Label vs UUID vs Device

Someone asked on a mailing list about the issues related to whether to use a label, UUID, or device name for /etc/fstab.

The first thing to consider is where the names come from. The UUID is assigned automatically by mkfs or mkswap, so you have to discover it after the filesystem or swap space has been made (or note it during the mkfs/mkswap process). For the ext2/3 filesystems the command “tune2fs -l DEVICE” will display the UUID and label (strangely mke2fs uses the term “label” while the output of tune2fs uses the term “volume name“). For a swap space I don’t know of any tool that can extract the UUID and name. On Debian (Etch and Unstable) the file command does not display the UUID for swap spaces or ext2/3 filesystems and does not display the label for ext2/3 filesystems. After I complete this blog post I will file a bug report.

If you are using a version of Debian earlier than Lenny (or a version of Unstable with this bug fixed) then you will be able to easily determine the label and UUID of a filesystem or swap space. Other than that the inconvenience of determining the UUID and label will be a reason for not using them in /etc/fstab (keep in mind that sys-admin work sometimes needs to be done at 3AM).

One problem with mounting by UUID or label is that it doesn’t work well with snapshots and block device backups. If you have a live filesystem on /dev/sdc and an image from a backup on /dev/sdd then there is a lot of potential for excitement when mounting by UUID or label. Snapshots can be made by a volume manager (such as LVM), a SAN, or an iSCSI server.

Another problem is that if a file-based backup is made (IE tar or cpio) then you lose the UUID and label. tune2fs allows setting the UUID, but that seems like a potential recipe for disaster. So this means that if mounting by UUID then you would potentially need to change /etc/fstab after doing a full filesystem restore from a file-based backup, this is not impossible but might not be what you desire. Setting the label is not difficult, but it may be inconvenient.

When using old-style IDE disks the device names were of the form /dev/hda for the first disk on the first controller (cable) and /dev/hdd for the second disk on the second controller. This was quite unambiguous, adding an extra disk was never going to change the naming.

With SCSI disks the naming issue has always been more complex, and which device gets the name /dev/sda was determined by the order in which the SCSI HAs were discovered. So if a SCSI HA which had no disks attached suddenly had a disk installed then the naming of all the other disks would change on the next boot! To make things more exciting Fedora 9 is using the same naming scheme for IDE devices as for SCSI devices, I expect that other distributions will follow soon and then even with IDE disks permanent names will not be available.

In this situation the use of UUIDs or LABELS is required for the use of partitions. However a common trend is towards using LVM for all storage, in this case LVM manages labels and UUIDs internally (with some excitement if you do a block device backup of an LVM PV). So LV names such as /dev/vg0/root then become persistent and there is no need for mounting via UUID or label.

The most difficult problem then becomes the situation where a FC SAN has the ability to create snapshots and make them visible to the same machine. UUID or label based mounting won’t work unless you can change them when creating the snapshot (which is not impossible but is rather difficult when you use a Windows GUI to create snapshots on a FC SAN for use by Linux systems). I have had some interesting challenges with this in the past when using a FC based SAN with Linux blade servers, and I never devised a good solution.

When using iSCSI I expect that it would be possible to force an association between SCSI disk naming and names on the server, but I’ve never had time to test it out.

Update: I have submitted Debian bug #489865 with a suggested change to the magic database.

Below are /etc/magic entries for displaying the UUID and label on swap spaces and ext2/3 filesystems:

Continue reading

7

New ZCAV Development

I have just been running some ZCAV tests on some new supposedly 1TB disks (10^40 bytes is about 931*2^30 so is about 931G according to almost everyone in the computer industry who doesn’t work for a hard disk vendor).

I’ve added a new graph to my ZCAV results page [1] with the results.

One interesting thing that I discovered is that the faster disks can deliver contiguous data at a speed of more than 110MB/s, previously the best I’d seen from a single disk was about 90MB/s. When I first wrote ZCAV the best disks I had to test with all had a maximum speed of about 10MB/s so KB/s was a reasonable unit. Now I plan to change the units to MB/s to make it easier to read the graphs. Of course it’s not that difficult to munge the data before graphing it, but I think that it will give a better result for most users if I just change the units.

The next interesting thing I discovered is that by default GNUplot defaults to using exponential notation at the value of 1,000,000 (or 1e+06). I’m sure that I could override that but it would still make it difficult to read for the users. So I guess it’s time to change the units to GB.

I idly considered using the hard drive manufacturer’s definition of GB so that a 1TB disk would actually display as having 1000GB (the Wikipedia page for Gibibyte has the different definitions [2]). But of course having decimal and binary prefixes used in the X and Y axis of a graph would be a horror. Also the block and chunk sizes used have to be multiples of a reasonably large power of two (at least 2^14) to get reasonable performance from the OS.

The next implication of this is that it’s a bad idea to have a default block size that is not a power of two. The previous block sizes were 100M and 200M (for 1.0x and 1.9x branches respectively). Expressing these as 0.0976G and 0.1953G respectively would not be user-friendly. So I’m currently planning on 0.25G as the block size for both branches.

While changing the format it makes sense to change as many things as possible at once to reduce the number of incompatable file formats that are out there. The next thing I’m considering is the precision. In the past the speed in K/s was an integer. Obviously an integer for the speed in M/s is not going to work well for some of the slower devices that are still in use (EG a 4* CD-ROM drive maxes out at 600KB/s). Of course the accuracy of this is determined by the accuracy of the system clock. The gettimeofday() system call returns the time in micro-seconds. I expect that most systems don’t approach miro-second accuracy. I expect that it’s not worth reporting with a precision that is greater than the accuracy. Then there’s no point in making the precision of the speed any greater than the precision of the time.

Things were easier with the Bonnie++ program when I just reduced the precision as needed to fit in an 80 column display. ;)

Finally I ran my tests on my new Dell T105 system. While I didn’t get time to do as many tests as I desired before putting the machine in production I did get to do a quick test of two disks running at full speed. Previously when testing desktop systems I had not found a system which when run with two disks of the same age as the machine could extract full performance from both disks simultaneously. While the Dell T105 is a server-class system, it is a rather low-end server and I had anticipated that it would lack performance in this regard. I was pleased to note that I could run both 1TB disks at full speed at the same time. I didn’t get a chance to test three or four disks though (maybe for scheduled down-time in the future).