Linux, politics, and other interesting things
Last time I tried using a Debian 64bit Xen kernel for Dom0 I was unable to get it to work correctly, it continually gave kernel panics when doing any serious disk IO. I’ve just tried to reproduce that problem on a test machine with a single SATA disk and it seems to be working correctly so I guess that it might be related to using software RAID and LVM (LVM is really needed for Xen and RAID is necessary for every serious server IMHO).
To solve this I am now experimenting with using a CentOS kernel on Debian systems.
There are some differences between the kernels that are relevant, the most significant one is the choice of which modules are linked in to the kernel and which ones have to be loaded with modprobe. The Debian choice is to have the drivers blktap blkbk and netbk linked in while the Red Hat / CentOS choice was to have them as modules. Therefore the Debian Xen utilities don’t try and load those modules and therefore when you use the CentOS kernel without them loaded Xen simply doesn’t work.
Error: Device 0 (vif) could not be connected. Hotplug scripts not working.
You will get the above error (after a significant delay) from the command “xm create -c name” if you try and start a DomU that has networking when the driver netbk is not loaded.
XENBUS: Timeout connecting to device: device/vbd/768 (state 3)
You will get the above error (or something similar with a different device number) for every block device from the kernel of the DomU if using one of the Debian 2.6.18 kernels, if using a 2.6.26 kernel then you get “XENBUS: Waiting for devices to initialise“.
Also one issue to note is that when you use a file: block device (IE a regular file) then Xen will use a loopback device (internally it seems to only like block devices). If you are having this problem and you destroy the DomU (or have it abort after trying for 300 seconds) then it will leave the loopback device enabled (it seems that the code for freeing resources in the error path is buggy). I have filed Debian bug report #503044  requesting that the Xen packages change the kernel configuration to allow more loopback devices and Debian bug report #503046  requesting that the resources be freed correctly.
Finally the following messages appear in /var/log/daemon.log if you don’t have the driver blktap installed:
BLKTAPCTRL: couldn’t find device number for ‘blktap0’
BLKTAPCTRL: Unable to start blktapctrl
It doesn’t seem to cause a problem (in my tests I can’t find something I want to do with Xen that required blktap), but I have loaded the driver – even removing error messages is enough of a benefit.
Another issue is that the CentOS kernel packages include a copy of the Xen kernel, so you have a Linux kernel matching the Xen kernel. So of course it is tempting to try and run that CentOS Xen kernel on a Debian system. Unfortunately the Xen utilities in Debian/Lenny don’t match the Xen kernel used for CentOS 5 and you get messages such as the following in /var/log/xen/xend-debug.log:
sysctl operation failed — need to rebuild the user-space tool set?
Exception starting xend: (13, ‘Permission denied’)
Update: Added a reference to another Debian bug report.