Xen shared storage

disk = [ ‘phy:/dev/vg/xen1,hda,w’, ‘phy:/dev/vg/xen1-swap,hdb,w’, ‘phy:/dev/vg/xen1-drbd,hdc,w’, ‘phy:/dev/vg/san,hdd,w!’ ]

For some work that I am doing I am trying to simulate a cluster that uses fiber channel SAN storage (among other things). The above is the disk line I’m using for one of my cluster nodes, hda and hdb are the root and swap disks for a cluster node, hdc is a DRBD store (DRBD allows a RAID-1 to be run across the cluster nodes via TCP), and hdd is a SAN volume. The important thing to note is the “w!” mode for the device, this means write access is granted even in situations whre Xen thinks it’s unwise (IE it’s being used by another Xen node or is mounted on the dom0). I’ve briefly tested this by making a filesystem on /dev/hdd on one node, copying data to it, then umounting it and mounting it on another node to read the data.

There are some filesystems that support having multiple nodes mounting the same device at the same time, these include CXFS, GFS, and probably some others. It would be possible to run one of those filesystems across nodes of a Xen cluster. However that isn’t my aim at this time. I merely want to have one active node mount the filesystem while the others are on standby.

One thing that needs to be solved for Xen clusters is fencing. When a node of a cluster is misbehaving it needs to be denied access to the hardware in case it recovers some hours later and starts writing to a device that is now being used by another node. AFAIK the only way of doing this is via the xm destroy command. Probably the only way of doing this is to have a cluster node ssh to the dom0 and then run a setuid program that calls xm destroy.

1

multiple ethernet devices in Xen

It seems that no-one has documented what needs to be done to correctly run multiple Ethernet devices (with one always being eth0 and the other always being eth1) in a Linux Xen configuration (or if it is documented then google wouldn’t find it for me).

vif = [ ‘mac=00:16:3e:00:01:01’, ‘mac=00:16:3e:00:02:01, bridge=xenbr1’ ]

Firstly I use a vif line such as the above in the Xen configuration. This means that there is one ethernet device with the hardware address of 00:16:3e:00:01:01 and another with the address of 00:16:3e:00:02:01. I just updated this section, the 00:16:3e prefix has officially been allocated to the Xen project for virtual machines. Therefore on your Xen installation you can do whatever you like with MAC addresses in that range without risk of competing with real hardware. The Xen code uses random MAC addresses in that range if you let it.

I have two bridge devices, xenbr0 and xenbr1. I only need to specify one as Xen can figure the other out.

Now when my domU’s boot they assign ethernet device names from the range eth0 to eth8. If there is only one virtual Ethernet device then it is always eth0 and things are easy. But for multiple devices I need to rename the interfaces.

eth0 mac 00:16:3e:00:01:01
eth1 mac 00:16:3e:00:02:01

This is done through the ifrename program (package name ifrename in Debian). I create a file named /etc/iftab with the above contents and then early in the boot process (before the interfaces are brought up) the devices will be renamed.

In the Red Hat model you edit the files such as /etc/sysconfig/networking/devices/ifcfg-eth0 and change the line that starts with HWADDR to cause a device rename on boot.

Update: the original version of this post used MAC addresses with a prefix of 00:00:00, the officially allocated prefix for Xen is 00:16:3e which I now use. Thanks to the person who commented about this.

2

installing Xen domU on Debian Etch

I have just been installing a Xen domU on Debian Etch. I’ll blog about installing dom0 later when I have a test system that I can re-install on (my production Xen machines have the dom0 set up already). The following documents a basic Xen domU (virtual machine) installation that has an IP address in the 10.0.0.0/8 private network address space and masquerades outbound network data. It is as general as possible.

lvcreate -n xen1 -L 2G /dev/vg

Firstly use the above command to create a block device for the domU, this can be a regular file but a LVM block device gives better performance. The above command is for a LV named xen1 on an LVM Volume Group named vg.

mke2fs -j /dev/vg/xen1

Then create the filesystem with the above command.

mount /dev/vg/xen1 /mnt/tmp
mount -o loop /tmp/debian-testing-i386-netinst.iso /mnt/cd
cd /mnt/tmp
debootstrap etch . file:///mnt/cd/
chroot . bin/bash
vi /etc/apt/sources.list /etc/hosts /etc/hostname
apt-get update
apt-get install libc6-xen linux-image-xen-686 openssh-server
apt-get dist-upgrade

Then perform the basic Debian install with the above commands. Make sure that you change to the correct directory before running the debootstrap command. The /etc/hosts and /etc/hostname files need to be edited to have the correct contents for the Xen image (the default is an empty /etc/hosts and /etc/hostname has the name of the parent machine). The file /etc/apt/sources.list needs to have the appropriate configuration for the version of Debian you use and for your preferred mirror. libc6-xen is needed to stop a large number of kernel warning messages on boot. It’s a little bit of work before you get the virtual machine working on the network so it’s best to do these commands (and other package installs) before the following steps. After the above type exit to leave the chroot and run umount /mnt/tmp.

lvcreate -n xen1-swap -L 128M /dev/vg
mkswap /dev/vg/xen1-swap

Create a swap device with the above commands.

auto xenbr0
iface xenbr0 inet static
pre-up brctl addbr xenbr0
post-down brctl delbr xenbr0
post-up iptables -t nat -F
post-up iptables -t nat -A POSTROUTING -o eth0 -s 10.1.0.0/24 -j MASQUERADE
address 10.1.0.1
netmask 255.255.255.0
bridge_fd 0
bridge_hello 0
bridge_stp off

Add the above to etc/network/interfaces and use the command ifup xenbr0 to enable it. Note that this masquerades all outbound data from the machine that has a source address in the 10.1.0.0/24 range.

net.ipv4.conf.default.forwarding=1

Put the above in /etc/sysctl.conf, run sysctl -p and echo 1 > /proc/sys/net/ipv4/conf/all/forwarding to enable it.

cp /boot/initrd.img-2.6.18-5-xen-686 /boot/xen-initrd-18-5.gz

Set up an initial initrd (actually initramfs) for the domU with a command such as the above. Once the Xen domU is working you can create the initrd from within it which gives a smaller image.

kernel = "/boot/vmlinuz-2.6.18-5-xen-686"
ramdisk = "/boot/xen-initrd-18-5.gz"
memory = 64
name = "xen1"
vif = [ "" ]
disk = [ "phy:/dev/vg/xen1,hda,w", "phy:/dev/vg/xen1-swap,hdb,w" ]
root = "/dev/hda ro"
extra = "2 selinux=1 enforcing=0"

The above is a sample Xen config file that can go in /etc/xen/xen1. Note that this will discover an appropriate bridge device by default, if you only plan to have one bridge then it’s quite safe, if you want multiple bridges then things will be a little more complex. Also note that there are two block devices created as /dev/hda and /dev/hdb, obviously if we wanted to have a dozen block devices then we would want to make them separate partitions with a virtual partition table. But in most cases a domU will be a simple install and won’t need more than two block devices.

xm create -c xen1

Now start the Xen domU with the above command. The -c option means to take the Xen console (use ^] to detach). After that you can login as root at the Xen console with no password, now is a good time to set the password.

Run the command apt-get install udev, this could not be done in the chroot before as it might mess up the dom0 environment. Edit /etc/inittab and disable gettys on tty2 to tty6, I don’t know if it’s possible to use them (the default and only option for xen console commands is tty1) and in any case you would not want 6, saving a few getty processes will save some memory.

Now you should have a basically functional Xen domU. Of course a pre-requisite for this is having a machine with a working dom0 installation. But the dom0 part is easier (and I will document it in a future blog post).

free software liason?

In my previous work as a sys-admin I have worked for a number of companies that depend heavily on free software. If you use a commercially supported distribution such as Red Hat Enterprise Linux then you get high quality technical support (much higher than you expect from closed-source companies), but this still doesn’t provide as much as you might desire as it is reactive support (once you notice a problem you report it). Red Hat has a Technical Account Manager offering that provides a higher level of support and there is also a Professional Services organization that can provide customised versions of the software. But the TAM and GPS offerings are mostly aimed at the larger customers (they are quite expensive).

It seems to me that a viable option for companies with smaller budgets is to have an employee dedicated to enhancing free software and getting changes accepted upstream. For a company that has a team of 5+ sys-admins the cost of a developer dedicated to such software development tasks should be saved many times over by the greater productivity of the sys-admins and the greater reliability of the servers.

This is not to criticise commercial offerings such as Red Hat’s TAM and GPS services, a dedicated free software developer could work with the Red Hat TAM and GPS people thus allowing the company to get the most value for money from the Red Hat consultants.

If using a free distribution such as Debian the case for a dedicated liason with the free software community is even stronger, as there is no formal support organization that compares to the Red Hat support (there are a variety of small companies that provide commercial support, but I am not aware of a 24*7 help desk or anything similar). If you have someone employed full-time as a free software developer then they can provide most of your support. It would probably make sense for a company that has mission critical servers running Debian to employ a Debian developer, a large number of Debian developers already work as sys-admins and finding one who is looking for a new job should not be difficult. There are more companies that would benefit from having DDs as employees than there are DDs, this isn’t an obstacle to hiring them as most hiring managers don’t realise the technical issues involved.

This is not to say that a company which can’t hire a DD should use a different distribution, merely that their operations will not be as efficient as they might be.