10

Xen and EeePC

I’ve been considering the possibility of using Xen on an ASUS EeePC as a mobile test platform for an Internet service. While the real service uses some heavy hardware it seems that a small laptop could simulate it when running with a small data set (only a few dozen accounts) and everything tuned for small amounts of RAM (small buffers for database servers etc).

According to the wikipedia page about the EeePC [1] the 70x and 900 versions of the EeePC use a Celeron-M CPU. According to Wikipedia that is based on the Pentium-M (which lacks PAE support and therefore can’t run Xen).

The Fedora Tutorial about the EeePC has a copy of the /proc/cpuinfo data from an EeePC [2] which shows that the model in question (which is not specified) lacks PAE. Are there any 70x or 90x variants that have PAE? Intel sometimes dramatically varies the features within a range of CPUs…

The 901 version and the 1000 series use an Intel “Atom” CPU. According to discussion on the Gentoo Forums some Atom CPUs have the “lm” flag (64bit) but no “vmx” flag for virtualisation [3] (which means that they can run Xen paravirtualised but no KVM or hardware virtualisation for Xen), it also has PAE. This is more than adequate.

According to the Wikipedia page the Atom comes in both 32bit and 64bit variants [4]. Hopefully the 901 version and the 1000 series EeePC will have the 64bit version.

The 90x versions have support for up to 4G of RAM but the 1000 series is only listed as supporting 2G, hopefully that will be 4G or more (although I wouldn’t be surprised if Intel had a chipset supporting only 4G of address space and PCI reservations limiting the machine to 3G). But even 3G will be enough for a mobile test/development platform which should make it easier to debug some problems remotely.

The 901 is available in Australia for just under $700. It’s a little more expensive than previous EeePC variants ($500 is a magic number below which things can be purchased with significantly less consideration), but it still might be something that one of my clients will pay for.

The prime aim is to be a mobile sys-admin platform that can be carried anywhere, running a Xen simulation of the target network is an added bonus.

Any suggestions for other laptops that should be considered will be welcome. It needs to be light (1.14Kg for a 901 EeePC is more than I desire), small (a reduced display size is not a problem), and not overly expensive ($700 is more than desired).

Update: JB HiFi is selling the 1000H model [5]. The 1000H has an 80G hard disk and weighs 1.45Kg. The extra 210g and slightly larger size are a down-side, as is the extra ~$50 in price.

A comment was made that OpenVZ could be used. If that avoids the need for PAE then a 702 series would do the job (with some USB flash devices as extras). The 702 is a mere 920g.

Update: This ZDNET review shows that the 901 can only handle 2G of RAM and has an Atom CPU that is only 32bit [6].

2

ISP Redundancy and Virtualisation

If you want a reliable network then you need to determine an appropriate level of redundancy. When servers were small and there was no well accepted virtual machine technology there were always many points at which redundancy could be employed.

A common example is a large mail server. You might have MX servers to receive mail from the Internet, front-end servers to send mail to the Internet, database or LDAP servers (of which there is one server for accepting writes and redundant slave servers for allowing clients to read data), and some back-end storage. The back-end storage is generally going to lack redundancy to some degree (all the common options involve mail being stored in one location). So the redundancy would start with the routers which direct traffic to redundant servers (typically a pair of routers in a failover configuration – I would use OpenBSD boxes running CARP if I was given a choice in how to implement this [1], in the past I’ve used Cisco devices).

The next obvious place for redundancy is for the MX servers (it seems that most ISPs have machines with names such as mx01.example.net to receive mail from the Internet). The way that MX records are used in the DNS means that there is no need for a router to direct traffic to a pair of servers, and even a pair of redundant routers is another point of failure so it’s best to avoid them where possible. A smaller ISP might have two MX machines that are used for both sending outbound mail from their users (which needs to go through a load-balancing router) as well as inbound mail. A larger ISP will have two or more machines dedicated to receiving mail and two or more machines dedicated to sending mail (when you scan for viruses on both sent and received mail it can take a lot of compute power).

Now the database or LDAP servers used for storing user account data is another possible place for redundancy. While some database and LDAP servers support multi-master operation a more common configuration is to have a single master and multiple slaves which are read-only. This means that you want to have more slaves than are really required so that you can lose one without impacting the service.

There are several ways of losing a server. The most obvious is a hardware failure. While server class machines will have redundant PSUs, RAID, ECC RAM, and a general high quality of hardware design and manufacture, they still have hardware problems from time to time. Then there are a variety of software related ways of losing a server, most of which stem from operator error and bugs in software. Of course the problem with the operator errors and software bugs is that they can easily take out all redundant machines. If an operator mistakenly decides that a certain command needs to be run on all machines they will often run it on all machines before realising that it causes things to go horribly wrong. A software bug will usually be triggered by the same thing on all machines (EG I’ve had bad data written to a master LDAP server cause all slaves to crash and had a mail loop between two big ISPs take out all front-end mail servers).

Now if you have a mail server running on a virtual platform such that the MX servers, the mail store, and the database servers all run on the same hardware then redundancy is very unlikely to alleviate hardware problems. It’s difficult to imagine a situation where a hardware failure takes out one DomU while leaving others running.

It seems to me that if you are running on a single virtual server there is no benefit in having redundancy. However there is benefit in having an infrastructure which supports redundancy. For example if you are going to install new software on one of the servers there is a possibility that the software will fail. Doing upgrades and then having to roll them back is one of the least pleasant parts of sys-admin work, not only is it difficult but it’s also unreliable (new software writes different data to shared files and you have to hope that the old version can cope with them).

To implement this you need to have a Dom0 that can direct traffic to multiple redundant servers for services which only have a single server. Then when you need to upgrade (be it the application or the OS) you can configure a server on the designated secondary address, get it running, and then disable traffic to the primary server. If there are any problems you can direct traffic back to the primary server (which can be done much more quickly than downgrading software). Also if configured correctly you could have the secondary server be accessible from certain IP addresses only. So you could test the new version of the software using employees as test users while customers use the old version.

One advantage a virtual machine environment for load balancing is that you can have as many virtual Ethernet devices as you desire and you can configure them using software (without changing cables in the server room). A limitation on the use of load-balancing routers is that traffic needs to go through the router in both directions. This is easy for the path from the Internet to the server room and the path from the server room to the customer network. But when going between servers in the server room it’s a problem (which is not insurmountable, merely painful and expensive). Of course there will be a cost in CPU time for all the extra routing. If instead of having a single virtual ethernet device for all redundant nodes you have a virtual ethernet device for every type of server and use the Dom0 as a router you will end up doubling the CPU requirements for networking without even considering the potential overhead of the load balancing router functionality.

Finally there is a significant benefit in virtual machines for reliability of services. That is the ability to perform snapshot backups. If you have sufficient disk space and IO capacity you could have a snapshot of your server taken every day and store several old snapshots. Of course doing this effectively would require some minor changes to the configuration of machines to avoid unnecessary writes, this would include not compressing old log files and using a ram disk for /tmp and any other filesystem with transient data. When you have snapshots you can then run filesystem analysis tools on the snapshots to detect any silent corruption that may be occurring and give the potential benefit of discovering corruption before it gets severe (but I have yet to see a confirmed report of this saving anyone). Of course similar snapshot facilities are available on almost every SAN and on many NAS devices, but there are many sites that don’t have the budget to use such equipment.

10

CPU Capacity for Virtualisation

Today a client asked me to advise him on how to dramatically reduce the number of servers for his business. He needs to go from 18 active servers to 4. Some of the machines in the network are redundant servers. By reducing some of the redundancy I can remove four servers, so now it’s a need to go from 14 to 4.

To determine the hardware requirements I analyzed the sar output from all machines. The last 10 days of data were available, so I took the highest daily average numbers from each machine for user and system CPU load and added them up, the result was 221%. So for the average daily CPU use three servers would have enough power to run the entire network. Then I looked at the highest 5 minute averages for user and system CPU load from each machine which add up to 582%. So if all machines were to have their peak usage times simultaneously (which doesn’t happen) then the CPU power of six machines would be needed. I conclude that the CPU power requirements are somewhere between 3 and 6 machines, so 4 machines may do an OK job.

The next issue is IO capacity. The current network has 2G of RAM in each machine and I plan to run it all on 4G Xen servers, so it’s a total of 16G of RAM instead of 36G. While some machines currently have unused memory I expect that the end result of this decrease in total RAM will be more cache misses and more swapping so the total IO capacity use will increase slightly. Now four of the servers (which will eventually become Xen Dom0’s) have significant IO capacity (large RAIDs – they appear to have 10*72G disks in a RAID-5) and the rest have a smaller IO capacity (they appear to have 4*72G disks in a RAID-10). The other 14 machines have the highest daily averages for iowait adding up to 9% and the highest 5 minute averages adding up to 105%. I hope that spreading that 105% of the IO capacity of a 4 DISK RAID-10 across four sets of 10 disk RAID-5’s won’t give overly bad performance.

I am concerned that there may be some flaw in the methodology that I am using to estimate capacity. One issue is that I’m very doubtful about the utility of measuring iowait, one issue is that iowait is the amount of IDLE CPU time when there are processes blocked on IO. So if for example you have 100% CPU time being used then iowait will be zero regardless of how much disk IO is in progress! One check that I performed was to add the maximum CPU time used, the maximum iowait, and the minimum IDLE time. Most machines gave totals that were very close to 100% when those columns were added, so it seems that if the maximum iowait for a 5 minute period plus the maximum CPU use plus the minimum idle time add up to 100% and the minimum idle time was not very low then it seems unlikely that there was any significant overlap between disk IO and CPU use to hide iowait. One machine had a total of 147% for those fields in the 5 minute average which suggests that the IO load may be higher than the 66% iowait number may indicate. But if I put that in a DomU on the machine with the most unused IO capacity then it should be OK.

I will be interested to read any suggestions for how to proceed with this. But unfortunately it will probably be impossible to consider any suggestion which involves extra hardware or abandoning the plan due to excessive risk…

I will write about the results.

9

Hosting a Xen Server

Yesterday I wrote about my search for a hosting provider for a Xen DomU [1]. One response was the suggestion to run a Dom0 and sell DomU’s to other people [2], it was pointed out that Steve Kemp’s Xen-Hosting.org project is an example of how to do this well [3]. Unfortunately Steve’s service is full and he is not planning to expand.

I would be open to the idea of renting a physical machine and running the Xen server myself, but that might be a plan for some other time (of course if a bunch of people want to sign up to such a thing and start hassling me I might change my mind). But at the moment I need to get some services online soon and I don’t want to spend any significant amount of money (I want to keep what I spend on net access below my Adsense revenue).

Also if someone else in the free software community wants to follow Steve’s example then I would be interested in hosting my stuff with them.

26

Xen Hosting

I’m currently deciding where to get a Xen DomU hosted. It will be used for a new project that I’m about to start which will take more bandwidth than my current ISP is prepared to offer (or at least they would want me to start paying and serious bandwidth is expensive in Australia). Below is a table of the options I’ve seriously considered so far (I rejected Dreamhost based on their reputation and some other virtual hosts were obviously not able to compare with the prices of the ones in the table). For each ISP I listed the two cheapest options, as I want to save money I’ll probably go for the cheapest option at the ISP I choose but want the option of upgrading if I need more.

I’m not sure how much storage I need, I think that 4.5G is probably not enough and even 6G might get tight. Also of course it depends on how many friends I share the server with.

Quantact has a reasonable cheap option for $15, but the $25 option is expensive and has little RAM. Probably 192M of RAM would be the minimum if I’m going to share the machine with two or more friends (to share the costs).

VPSland would have rated well if it wasn’t for the fact that they once unexpectedly deleted a DomU belonging to a client (they claimed that the bill wasn’t paid) and had no backups. Disabling a service when a bill is not paid is fair, charging extra for the “service” of reenabling it is acceptable, but just deleting it with no backups is unacceptable. But as I’m planning on serving mostly static data this won’t necessarily rule them out of consideration.

It seems that linode and slicehost are the best options (Slicehost seems the most clueful and Linode might be the second most). Does anyone have suggestions about other Xen servers that I haven’t considered?

XenEurope seems interesting. One benefit that they have is being based in the Netherlands which has a strong rule of law (unlike the increasingly corrupt US). A disadvantage is that the Euro is a strong currency and is expected to get even stronger. Services paid in Euros should be expected to cost more in future when paid in Australian dollars, while services paid in US dollars should be expected to cost less.

Gandi.net has an interesting approach, they divide a server into 64 “shares” and then you can buy as many as you want (up to 16 shares for 1/4 of a server) for your server. If at any time you run out of bandwidth then you just buy more shares. They also limit bandwidth by guaranteed transfer rate (in multiples of 3Mb/s) instead of limiting the overall data transferred on a per-monthly basis (as most providers do). They don’t mention whether you can burst above that 3Mb/s limit – while 3Mb/s for 24*7 is a significant amount of data transfer it isn’t that much if you have a 200MB file that will be downloaded a few times a day while interactive tasks are also in progress (something that may be typical usage for my server). Of course other providers generally don’t provide any information on how fast data can be transferred and will often be smaller than 3Mb/s.

Also if anyone who I know wants to share access to a server then please contact me via private mail.

ISP RAM Disk Bandwidth (per month) Price $US
Linode 360M 10G 200GB $20
Linode 540M 15G 300GB $30
Slicehost 256M 10G 100GB $20
Slicehost 512M 20G 200GB $38
VPSLand 192M 6G 150GB $16
VPSLand 288M 8G 200GB $22
Quantact 96M 4.5G 96GB $15
Quantact 128M 6G 128GB $25
rimuhosting 96M 4G 30G $20
XenEurope 128M 10G 100G $16 (E10)
XenEurope 256M 20G 150G $28 (E17.50)
Gandi.net 256M 5G 3Mb/s $7.50 or E6
5

Installing a Red Hat based DomU on a Debian Dom0

The first step is to copy /images/xen/vmlinuz and /images/xen/initrd.img from the Fedora (or RHEL or CentOS) DVD somewhere convenient, I use /boot/OS/ (where OS is the name of the image) but other locations will do.

Now choose a suitable Ethernet MAC address for the interface (see my previous post on how I choose them [1]).

Create a temporary block device for the install, I use /dev/VG0/OS-install (where OS is replaced by the name of the distribution, “f8” or “cent5“), it’s a logical volume in an LVM volume group named VG0. The device should be at least 2G in size for a basic Fedora install (512M for swap, 1G for files, and 512M free after the install). It is of course possible to use DOS partitions for the Xen block devices, but this would be unreasonably difficult to manage. An option for people who don’t like LVM would be to use files on an XFS filesystem (Ext3 performs poorly when creating and removing large files).

When configuring Xen on Debian systems I generally use /dev/hda type device names. The device name seems quite arbitrary and /dev/hda is a familiar name for hard drives that many people have been used to for 15+ years. But the Fedora install process doesn’t like it and I’m forced to use /dev/xvda etc.

I often install Fedora on a machine that only has 256M of RAM spare for the DomU. For recent versions of Fedora 256M of RAM is about the minimum for an install at the best of times, and a HTTP install takes even more because the root filesystem used for the install is copied via HTTP and stored in a RAM disk. It might be possible to use less RAM with a CD or DVD install or even a NFS install, but I couldn’t get CD/DVD installation working and I generally don’t give Xen DomU’s NFS access if I can avoid it. So I had to create a swap space (an attempt to do an install with 256M of RAM and no swap aborted when installing the kernel package). I expect that most serious use of Xen will have 256M of RAM or less for the DomU, part of the problem here is that Xen allocates RAM not virtual memory. VMWare allocates virtual memory so the total memory for virtual machines can be greater than physical RAM and thus this problem will be less common with VMWare.

I believe that the best way of configuring virtual machine images is to have the virtual machine manager (Xen in this case) provide block devices to the virtual machine and have the virtual machine implement no partitioning (no LVM or anything equivalent). The main reason is that DOS partition tables and LVM configuration on a block device used by Xen can not be used easily in the host environment (the Dom0 for Xen). I am not aware of how to access DOS partition tables (although I’m sure it’s possible somehow) and while LVM can be used it’s a bad idea due to the fact that there is no way to deactivate a LVM volume group that is active, and the fact that there is no support for having multiple volume groups of the same name. The lack of support for multiple volume groups of the same name is a reasonable limitation, but an insurmountable problem when using a virtual machine environment. It’s quite reasonable to create several cloned instances of a virtual machine and renaming an LVM volume group would require more changes inside the virtual machine than you would want. Also using snap-shots of old versions of the virtual machine data is difficult if the same volume group name is used.

So for ease of management I want to have filesystems on block devices (such as /dev/xvda) instead of partitions (such as /dev/xvda1). Unfortunately Anaconda (the Fedora installer) doesn’t support this. So I had to do the initial install with DOS partitions and then fix it afterwards. So use the manual option and create a primary partition for the root filesystem and then create a non-primary partition for swap (when using small amounts of RAM such as 256M) so that swap can be used during the install. The root filesystem needs to be at the start of the disk to make it easier to sort this out later.

After installing Fedora and shutting the virtual machine down the next step is to copy the block device to the desired configuration (filesystem on an unpartitioned device). If the root filesystem is the first partition then the first 63 sectors will be the partition table and reserved space so dd can be used to copy the data with the following commands:

dd if=/dev/VG0/OS-install of=/dev/VG0/OS bs=512 skip=63
e2fsck -f /dev/VG0/OS
resize2fs /dev/VG0/OS

The next step is to mount the device /dev/VG0/OS in the Dom0 to change /etc/fstab, I use /dev/xvda for the root device and /dev/xvdb for swap.

Now to remove the cruft:
Avahi is a network service discovery system, mainly used for laptops and isn’t needed on a server, it is installed by default on all recent Fedora, RHEL and CentOS releases but it is not useful in a DomU (any unused network service is a security risk if you don’t disable or remove it). Smartmontools is for detecting impending failure of a hard drive and does not do any good when using a virtual block device (you run it on the Dom0). It might be considered a bug that smartd doesn’t exit on startup when it sees a device such as /dev/xvda. The pcsc-lite package for managing smart cards is of no use to me and all the other people who don’t own readers for smart-cards, and it can therefore be removed. Bluetooth networking support (in the package bluez-utils) is also only usable in a Dom0 (AFAIK), and the only bluetooth device I own is my mobile phone so I can’t use it on my computer. The command “yum remove avahi smartmontools pcsc-lite bluez-utils” removes them.

For almost all of my DomU’s I don’t use NFS or do any printing, so I remove the packages related to them. Also autofs is in most cases only useful for servers when mounting NFS filesystems. I remove them with the command “yum remove nfs-utils portmap cups autofs“.

The GPM daemon (which supports cut/paste operations with a mouse on virtual consoles) is of no use on a Xen DomU, unfortunately the vim-enhanced package depends on it. I could just disable the daemon, but as I like to run small images I remove it with “yum remove gpm“. I may have to reinstall it on some images as some of my clients like the extra VIM functionality.

It’s unfortunate that debootstrap doesn’t work on CentOS (and presumably Fedora) so installing a Debian DomU on a CentOS/Fedora Dom0 requires creating an image on a Debian machine or downloading an image from www.jailtime.org .

Sample Xen Config for the install:

kernel = “/boot/OS/vmlinuz”
ramdisk = “/boot/OS/initrd.img”
memory = 256
name = “OS”
vif = [ ‘mac=00:16:3e:66:66:68, bridge=xenbr0’ ]
disk = [ ‘phy:/dev/VG0/OS-install,xvda,w’ ]
extra=”askmethod text”

Sample Xen Config for operation:

kernel = “/boot/cent5/vmlinuz-2.6.18-53.el5xen”
ramdisk = “/boot/cent5/initrd-2.6.18-53.el5xen.img”
memory = 256
name = “cent5”
vif = [ ‘mac=00:16:3e:66:66:68, bridge=xenbr0’ ]
disk = [ ‘phy:/dev/VG0/cent5,xvda,w’, ‘phy:/dev/VG0/cent5-swap,xvdb,w’ ]
root = “/dev/xvda ro”

3

Xen and Swap

The way Xen works is that the RAM used by a virtual machine is not swappable, so the only swapping that happens is to the swap device used by the virtual machine. I wondered whether I could improve swap performance by using a tmpfs for that swap space. The idea is that as only one out of several virtual machines might be using swap space, a tmpfs storage could cache the most recently used data and result in the virtual machine which is swapping heavily taking less time to complete the memory-hungry job.

I decided to test this on Debian/Etch (both DomU and Dom0).

RAM size vs time

Above is a graph of the time taken in seconds (on the Y axis) to complete the command “apt-get -y install psmisc openssh-client openssh-server linux-modules-2.6.18-5-xen-686 file less binutils strace ltrace bzip2 make m4 gcc g++“, while the X axis has the amount of RAM assigned to the DomU in megabytes.

The four graphs are for using a real disk (in this case an LVM logical volume) and for using tmpfs with different amounts of RAM backing it. The numbers 128000, 196000, and 256000 and the numbers of kilobytes of RAM assigned to the Dom0 (which manages the tmpfs). As you can see it’s only below about 20M of RAM that tmpfs provides a benefit. I don’t know why it didn’t provide a benefit with larger amounts of RAM, below 48M the amount of time taken started increasing exponentially and I expected that there was the potential for a significant performance boost.

After finding that the benefits for a single active DomU were not that great I did some tests with three DomU’s running the same APT command. With 16M of RAM and swap on the hard drive it took an average of 408 seconds, but with swap on the tmpfs it took an average of 373 seconds – an improvement of 8.5%. With 32M of RAM the times were 225 and 221 seconds – a 1.8% improvement.

Incidentally to make the DomU boot successfully with less than 30M of RAM I had to use “MODULES=dep” in /etc/initramfs-tools/initramfs.conf. To get it to boot with less than 14M I had to manually hack the initramfs to remove LVM support (I create my initramfs in the Dom0 so it gets drivers that aren’t needed in the DomU). I was unable to get a DomU with 12M of RAM to boot with any reasonable amount of effort (I expect that compiling the kernel without an initramfs would have worked but couldn’t be bothered).

Future tests will have to be on another machine as the machine used for these tests caught on fire – this is really annoying, if someone suggests an extra test I can’t run it.

To plot the data I put the following in a file named “command” and then ran “gnuplot command“:
unset autoscale x
set autoscale xmax
unset autoscale y
set autoscale ymax
set xlabel “MB”
set ylabel “seconds”
plot “128000”
replot “196000”
replot “256000”
set terminal png
set output “xen-cache.png”
replot “disk”

In future when doing such tests I will use “time -p ” (for POSIX format) which means that it displays a count of seconds rather than minutes and seconds (and saves me from running sed and bc to fix things up).

I am idly considering writing a program to exercise virtual memory for the purpose of benchmarking swap on virtual machines.

My raw data is below:
Continue reading

8

The Future of Xen

I’m currently in Xen hell. My Thinkpad (which I won’t replace any time soon) has a Pentium-M CPU without PAE support. I think that Debian might re-introduce Xen support for CPUs without PAE in Lenny, but at the moment I have the choice of running without Xen or running an ancient kernel on my laptop. Due to this I’ve removed Xen from my laptop (I’m doing most of my development which needs Xen on servers anyway).

Now I’ve just replaced my main home server. It was a Pentium-D 2.8GHz machine with 1.5G of RAM and a couple of 300G SATA disks in a RAID-1. Now it’s a Pentium E2160 1.8Ghz machine with 3G of RAM with the same disks. Incidentally Intel suck badly, they are producing CPUs with names that have no meaning, and most of their chipsets don’t support more than 4G of physical address space [1]. I wanted 4G of RAM but the machine I was offered only supported addressing 4G and 700M of that was used for PCI devices. For computation tasks it’s about the same speed as the old Pentium-D, but it has faster RAM access, more RAM, uses less power, and makes less noise. If I was going to a shop to buy something I probably would have chosen something different to get support for more than 4G of RAM, but as I got the replacement machine for free as a favor I’m not complaining!

I expected that I could just install the new server and have things just work. There were some minor issues such as configuring X for the different video hardware (and installing the 915resolution package (which is only needed in Etch) to get the desired 1650×1400 resolution. But for the core server tasks I expected that I could just move the hard drives across and have it work.

After the initial install the system crashed whenever I did any serious hard drive access from Dom0, the Dom0 kernel Oopsed and network access was cut off from the DomU’s (I’m not sure whether the DomU’s died but without any way of accessing them it doesn’t really matter much). As a test I installed the version of the Xen hypervisor from Unstable and it worked. But the Xen hypervisor from Unstable required the Xen tools from Unstable which also required the latest libc6, and therefore the entire Dom0 had to be upgraded. Then in an unfortunate accident unrelated to Xen (cryptsetup in Debian/Unstable warns you if you try to use a non-LUKS option on a device which has been used for LUKS and would have saved me) I lost the root filesystem before I finished the upgrade.

So I did a fresh install of Debian/Unstable, this time it didn’t crash on heavy disk IO, instead it would lock up randomly when under no load.

I’ve now booted a non-Xen kernel and it’s working well. But this situation is not acceptable long-term, a large part of the purpose of the machine is to run virtualisation so that I can test various programs under multiple distributions. I think that I will have to try some other virtualisation technologies. The idea of running KVM on real servers (ones that serve data to the Internet) doesn’t thrill me, Tavis Ormandy’s paper about potential ways of exploiting virtual machine technologies [2] is a compelling argument for para-virtualisation. Fortunately however my old Pentium-3 machines running Xen seem quite reliable (replacing both software and hardware is a lot of pain that I don’t want).

In the near future I will rename the Xen category on my blog to Virtualisation. For older machines Xen is still working reasonably well, but for all new machines I expect that I will have to use something else – and I’ll be blogging about the new machines not the old. I expect that an increasing number of people will be moving away from Xen in the near future. It doesn’t seem to have the potential to give systems that are reliable when running on common hardware.

Ulrich Drepper doesn’t have a high opinion of Xen [3], the more I learn about it the more I agree with Ulrich.

Xen for Training

I’m setting up a training environment based on Xen. The configuration will probably be of use to some people so I’m including it below the fold. Please let me know if you have any ideas for improvements.

The interface for the user has the following documentation:

  • sudo -u root xen-manage create centos|debian [permissive]
    Create an image, the parameter debian or centos specifies which
    distribution you want to use and the optional parameter permissive
    specifies that you want to use Permissive mode (no SE Linux access controls
    enforced).
    Note that creating an image will leave you at it’s console. Press ^]
    to escape from the console.
  • sudo -u root xen-manage list
    Display the Xen formation on your DomU. Note that it doesn’t tell you whether
    you are using Debian or CentOS, you have to access the console to do that.
  • sudo -u root xen-manage console
    Access the console.
  • sudo -u root xen-manage destroy
    Destroy your Xen image – if it’s crashed and you want to restart it.

Continue reading

3

Xen and Security

I have previously posted about the difference between using a chroot and using SE Linux [1].

Theo de Raadt claims that virtualisation does not provide security benefits [2] based on the idea that the Xen hypervisor may have security related bugs.

From my understanding of Xen a successful exploit of a Xen system with a Dom0 that is strictly used for running the DomU’s would usually start by gaining local root on one of the DomU instances. From there it is possible to launch an attack on the Xen Dom0. One example of this is the recent Xen exploit (CVE-2007-4993) [3] where hostile data in a grub.conf in a DomU could be used to execute privileged commands in the Dom0. Another possibility would be to gain root access to a DomU and then exploit a bug in the Xen API to take over the hypervisor (I am not aware of an example of this being implemented). A final possibility is available when using QEMU code to provide virtual hardware where an attacker could exploit QEMU bugs, an example of this is CVE-2007-0998 where a local user in a guest VM could read arbitrary files in the host [4] – it’s not clear from the advisory what level of access is required to exploit it (DomU-user, DomU-root, or remote VNC access). VNC is different from other virtual hardware in that the sys-admin of the virtual machine (who might be untrusted) needs to access it. Virtual block devices etc are only accessed by the DomU and Xen manages the back-end.

The best reference in regard to these issues seems to be Tavis Ormandy’s paper about hostile virtualised environments [5]. Tavis found some vulnerabilities in the QEMU hardware emulation, and as QEMU code is used for a fully virtualised Xen installation it seems likely that Xen has some vulnerabilities in this regard. I think that it is generally recommended that for best security you don’t run fully virtualised systems.

The remote-console type management tools are another potential avenue of attack for virtualised servers in the case where multiple users run virtual machines on the same host (hardware). I don’t think that this is an inherent weakness of virtualisation systems. When security is most important you have one sys-admin running all virtual machines – which incidentally seems to be the case for most implementations of Xen at the moment (although for management not security reasons). In ISP hosting type environments I doubt that a remote console system based on managing Xen DomU’s is going to be inherently any less secure than a typical remote console system for managing multiple discrete computers or blades.

I have just scanned the Xen hypervisor source, the file include/asm-x86/hypercall.h has 18 entries for AMD64 and 17 for i386 while include/xen/hypercall.h has 18 entries. So it seems that there are 35 or 36 entry points to call the hypervisor, compared to 296 system calls on the i386 version of Linux (which includes the sys_socketcall system call which expands to many system calls). This seems to be one clear indication that the Linux kernel is inherently more complex (and therefore likely to have a higher incidence of security flaws) than the Xen hypervisor.

Theo’s main claim seems to be that Xen is written by people who aren’t OpenBSD developers and who therefore aren’t able to write secure code. While I don’t agree with his strong position I have to note the fact that OpenBSD seems to have a better security history than any other multi-user kernel for which data is available. But consider a system running Xen with Linux in Dom0 and multiple para-virtualised OpenBSD DomU’s. If the Linux Dom0 has OpenSSH as the only service being run then the risk of compromise would be from OpenSSH, IP based bugs in the Linux kernel (either through the IP address used for SSH connections or for routing/bridging to the OpenBSD instances), and from someone who has cracked root on one of the OpenBSD instances and is attacking the hypervisor directly.

Given that OpenSSH comes from the OpenBSD project it seems that the above scenario would only add the additional risk of an IP based Linux kernel attack. While a root compromise of an OpenBSD instance (consider that a typical OpenBSD system will run a lot of software that doesn’t come from the OpenBSD project – much of which won’t have a great security history) would only lose that instance unless the attacker can also exploit the hypervisor (which would be a much more difficult task than merely cracking some random daemon running as root that the sys-admin is forced to install). Is the benefit of having only one instance of OpenBSD cracked due to a bad daemon enough to outweigh the risk of a Linux IP stack?

I’m sure that the OpenBSD people would consider that a better option would be OpenBSD in the Dom0 and in the DomU. In which case the risk of damage from a root compromise due to one badly written daemon that didn’t come from OpenBSD is limited to a single DomU unless the attacker also compromises the hypervisor. When working as a sys-admin I have been forced by management to install some daemons as root which were great risks to the security of the system, if I had the ability to install them in separate DomU’s I would have been able to significantly improve the security of the system.

Another flaw in Theo’s position is that he seems to consider running a virtual machine as the replacement of multiple machines – which would be an obvious decrease in security. However in many cases the situation is that no more or less hardware is purchased, it is just used differently. If instead of a single server running several complex applications you have a Xen server running multiple DomU’s which each have a single application then things become much simpler and more secure. Upgrades can be performed on one DomU at a time which decreases the scope of failure (which often means that you only need one business unit to sign-off on the upgrade) and upgrades can be performed on an LVM snapshot (and rolled back with ease if they don’t succeed). A major problem with computer security is when managers fear problems caused by upgrades and prohibit their staff from applying security fixes. This combined with the fact that on a multiple DomU installation one application can be compromised without immediate loss of the others (which run in different DomU’s and require further effort by the attacker for a Xen compromise) provides a significant security benefit.

It would be nice for security if every application could run on separate hardware, but even with blades this is not economically viable – not even for the biggest companies.

I have converted several installations from a single overloaded and badly managed server to a Xen installation with multiple DomU’s. In all cases the DomU’s were easier to upgrade (and were upgraded more often) and the different applications and users were more isolated.

Finally there is the possibility of using virtualisation to monitor the integrity of the system, Bill Broadley’s presentation from the 2007 IT Security Symposium [6] provides some interesting ideas about what can be done. It seems that having a single OpenBSD DomU running under a hypervisor (maybe Xen audited by the OpenBSD people) with an OpenBSD Dom0 would offer some significant benefits over a single OpenBSD instance.