Bugs and User Practice

Wouter points out a mistake in one of my blog posts which was based on old data [1]. My original post was accurate for older distributions of Linux but since then the bug in question was fixed.

Normally when writing blog posts or email I do a quick test before committing the text to avoid making mistakes (it’s easy to mis-remember things). However in this case the bug would dead-lock machines which made me hesitant to test it (I didn’t have a machine that I wanted to dead-lock).

There are two lessons to be learned from this. The most obvious is to test things thoroughly before writing about them (and have test machines available so that tests which cause service interruption or data loss can be performed).

The next lesson is that when implementing software you should try not to have limitations that will affect user habits in a bad way. In the case of LVM, if the tool lvetend had displayed a message such as “Resizing the root LV would dead-lock the system, until locking is fixed such requests will be rejected – please boot from a rescue disk to perform this operation” then I would have performed a test before writing the blog post (as it would be a harmless test to perform). Also on occasion when I really wanted to resize a root device without a reboot I would have attempted the command in the hope that LVM had been fixed.

A bug that deadlocks a system is one that will really have an adverse affect on users, both their habits in future use, and the probability of them using the software in future. A bug (or missing feature) that displays a warning message will have much less of a problem.

From now on I will still be hesitant in using lvextend on a LV for a root filesystem on any machines other than the very latest for fear that they will break. The fact that lvextend will sometimes work on the root filesystem and sometimes put the machine offline is a serious problem that impacts my use of that feature.

Most people won’t be in a position to have a bug or missing feature that deadlocks a system, but there are an infinite number of ways that software can potentially interrupt service or destroy data. Having software fail in a soft way such that data is not lost is a significant benefit for users and an incentive to use such software.

I’ve put this post in the WTF category, because having a dead-lock bug in a very obvious use-case of commonly used software really makes me say WTF.

3 comments to Bugs and User Practice

  • Whenever I write “instructional” posts, I always try to use first person, and refrain from telling or recommending what others should do. For example, instead of writing:

    “You can re-size your monitor screen with a sledgehammer…”

    I try to phrase it like:

    “I was able to re-size my monitor with a sledgehammer, but the results were not optimal”

    Its difficult to make sure that everything technical works in every situation for everyone. Most of my blog posts are notes I take for my own use which I then share with others. In situations where there is a lot at stake, I try to give some warning about the gravity of the steps.

  • “However in this case the bug would dead-lock machines which made me hesitant to test it (I didn’t have a machine that I wanted to dead-lock).”

    Qemu (or KVM if you have the right hardware) is your friend.

    (Currently debugging a kernel crasher in a multi-node cluster of Qemu images running on my desktop).

  • etbe

    John: You are correct. But the way I set things up is that I don’t use LVM in DomU’s – the Dom0 manages the storage. So I would have had to create a DomU and install LVM, it’s not impossible (as it’s not impossible for me to grab one of the spare machines that’s in my computer room, take a blank hard drive from the pile, and do a fresh install), but it’s more effort and I got a bit lazy.