Xen CPU Use per Domain again

8 years ago I wrote a script to summarise Xen CPU use per domain [1]. Since then changes to Xen required changes to the script. I have new versions for Debian/Wheezy (Xen 4.1) and Debian/Jessie (Xen 4.4).

Here’s a new script for Debian/Wheezy:

#!/usr/bin/perl
use strict;

open(LIST, "xm list --long|") or die "Can't get list";

my $name = "Dom0";
my $uptime = 0.0;
my $cpu_time = 0.0;
my $total_percent = 0.0;
my $cur_time = time();

open(UPTIME, "</proc/uptime") or die "Can't open /proc/uptime";
my @arr = split(/ /, <UPTIME>);
$uptime = $arr[0];
close(UPTIME);

my %all_cpu;

while(<LIST>)
{
  chomp;
  if($_ =~ /^\)/)
  {
    my $cpu = $cpu_time / $uptime * 100.0;
    if($name =~ /Domain-0/)
    {
      printf("%s uses %.2f%% of one CPU\n", $name, $cpu);
    }
    else
    {
      $all_cpu{$name} = $cpu;
    }
    $total_percent += $cpu;
    next;
  }
  $_ =~ s/\).*$//;
  if($_ =~ /start_time /)
  {
    $_ =~ s/^.*start_time //;
    $uptime = $cur_time – $_;
    next;
  }
  if($_ =~ /cpu_time /)
  {
    $_ =~ s/^.*cpu_time //;
    $cpu_time = $_;
    next;
  }
  if($_ =~ /\(name /)
  {
    $_ =~ s/^.*name //;
    $name = $_;
    next;
  }
}
close(LIST);

sub hashValueDescendingNum {
  $all_cpu{$b} <=> $all_cpu{$a};
}

my $key;

foreach $key (sort hashValueDescendingNum (keys(%all_cpu)))
{
  printf("%s uses %.2f%% of one CPU\n", $key, $all_cpu{$key});
}

printf("Overall CPU use approximates %.1f%% of one CPU\n", $total_percent);

Here’s the script for Debian/Jessie:

#!/usr/bin/perl

use strict;

open(UPTIME, "xl uptime|") or die "Can't get uptime";
open(LIST, "xl list|") or die "Can't get list";

my %all_uptimes;

while(<UPTIME>)
{
  chomp $_;

  next if($_ =~ /^Name/);
  $_ =~ s/ +/ /g;

  my @split1 = split(/ /, $_);
  my $dom = $split1[0];
  my $uptime = 0;
  my $time_ind = 2;
  if($split1[3] eq "days,")
  {
    $uptime = $split1[2] * 24 * 3600;
    $time_ind = 4;
  }
  my @split2 = split(/:/, $split1[$time_ind]);
  $uptime += $split2[0] * 3600 + $split2[1] * 60 + $split2[2];
  $all_uptimes{$dom} = $uptime;
}
close(UPTIME);

my $total_percent = 0;

while(<LIST>)
{
  chomp $_;

  my $dom = $_;
  $dom =~ s/ .*$//;

  if ( $_ =~ /(\d+)\.[0-9]$/ )
  {
    my $percent = $1 / $all_uptimes{$dom} * 100.0;
    $total_percent += $percent;
    printf("%s uses %.2f%% of one CPU\n", $dom, $percent);
  }
  else
  {
    next;
  }
}

printf("Overall CPU use approximates  %.1f%% of one CPU\n", $total_percent);

Dedicated vs Virtual Servers

A common question about hosting is whether to use a dedicated server or a virtual server.

Dedicated Servers

If you use a dedicated server then you will face the risk of problems which interrupt the boot process. It seems that all the affordable dedicated server offerings lack any good remote management, so when the server doesn’t boot you either have to raise a trouble ticket with the company running the Data-Center (DC) or use some sort of hack. Hetzner is a dedicated server company that I have had good experiences with [1], when a server in their DC fails to boot you can use their web based interface (at no extra charge or delay) to boot a Linux recovery environment which can then be used to fix whatever the problem may be. They also charge extra for hands-on support which could be used if the Linux recovery environment revealed no errors but the system just didn’t work. This isn’t nearly as good as using something like IPMI which permits remote console access to see error messages and more direct control of rebooting.

The up-side of a dedicated server is performance. Some people think that avoiding virtualisation improves performance, but in practice most virtual servers use virtualisation technologies that have little overhead. A bigger performance issue than the virtualisation overhead is the fact that most companies running DCs have a range of hardware in their DC and your system (whether a virtual server or a dedicated server) will be on a random system from their DC. I have observed hosting companies to give different speed CPUs and for dedicated servers different amounts of RAM for the same price. I expect that the disk IO performance also varies a lot but I have no evidence. As long as the hosting company provides everything that they offered before you sign the contract you can’t complain. It’s worth noting that CPU performance is either poorly specified or absent in most offers and disk IO performance is almost never specified. One advantage of dedicated servers in this regard is that you get to know the details of the hardware and can therefore refuse certain low spec hardware.

The real performance benefit of a dedicated server is that disk IO performance won’t be hurt by other users of the same system. Disk IO is the real issue as CPU and RAM are easy to share fairly but disk performance is difficult to share and is also a significant bottleneck on many servers.

Dedicated servers also have a higher minimum price due to the fact that a real server is being used which involves hardware purchase and rack space. Hetzner’s offers which start at E29 per month are about as cheap as it’s possible to get. But it appears that the E29 offer is for an old server – new hardware starts at E49 per month which is still quite cheap. But no dedicated server compares to the virtual servers which can be rented for prices less than $10 per month.

Virtual Servers

A virtual server will typically have an effective management interface. You should expect to get web based access to the system console as well as ssh console access. If console access is not sufficient to recover the system then there is an option to boot from a recovery device. This allows you to avoid many situations that could potentially result in down-time and when things go wrong it allows you to recover faster. Linode is an example of a company that provides virtual servers and provides a great management interface [2]. It would take a lot of work with performance monitoring and graphing tools to give the performance overview that comes for free with the Linode interface.

Disk IO performance can suck badly on virtual servers and it can happen suddenly and semi-randomly. If someone else who is using a virtual server on the same hardware is the target of a DoS attack then your performance can disappear. Performance for CPU is generally fairly reliable though. So a CPU bound server would be a better fit on the typical virtual server options than a disk IO bound server.

Virtual servers are a lot cheaper at the low end so if you don’t need the hardware capabilities of a minimal dedicated server (with 1G of RAM for Hetzner and a minimum of 8G of RAM for some other providers) then you can save a lot of money by getting a virtual server.

Finally the options for running a virtual machine under a virtual machine aren’t good, AFAIK the only options that would work on a commercial VPS offering are QEMU (an x86 CPU instruction emulator), Hercules (a S/370 S/390, and Z series IBM mainframe emulator), and similar CPU emulators. Please let me know if there are any other good options for running a virtual machine on a VPS. Now while these emulators are apparently good for debugging OS development they aren’t something that are generally useful for running a virtual machine. I knew someone who ran his important servers under Hercules so that x86 exploits couldn’t be used for attacking them, but apart from that CPU emulation isn’t generally useful for servers.

Summary

If you want to have entire control of the hardware or if you want to run your own virtual machines that suit your needs (EG one with lots of RAM and another with lots of disk space) then a dedicated server is required. If you want to have minimal expense or the greatest ease of sysadmin use then a virtual server is a better option.

But the cheapest option for virtual hosting is to rent a server from Hetzner, run Xen on it, and then rent out DomUs to other people. Apart from the inevitable pain that you experience if anything goes wrong with the Dom0 this is a great option.

As an aside, if anyone knows of a reliable company that offers some benefits over Hetzner then please let me know.

What I would Like to See

There is no technical reason why a company like Linode couldn’t make an offer which was a single DomU on a server taking up all available RAM, CPU, and disk space. Such an offer would be really compelling if it wasn’t excessively expensive. That would give Linode ease of management and also a guarantee that no-one else could disrupt your system by doing a lot of disk IO. This would be really easy for Linode (or any virtual server provider) to implement.

There is also no technical reason why a company like Linode couldn’t allow their customers to rent all the capacity of a physical system and then subdivide it among DomUs as they wish. I have a few clients who would be better suited by Linode DomUs that are configured for their needs rather than stock Linode offerings (which never seem to offer the exact amounts of RAM, disk, or CPU that are required). Also if I had a Linode physical server that only had DomUs for my clients then I could make sure that none of them had excessive disk IO that affected the others. This would require many extra features in the Linode management web pages, so it seems unlikely that they will do it. Please let me know if there is someone doing this, it’s obvious enough that someone must be doing it.

Update:

A Rimuhosting employee pointed out that they offer virtual servers on dedicated hardware which meets this criteria [3]. Rimuhosting allows you to use their VPS management system for running all the resources in a single server (so no-one else can slow your VM down) and also allow custom partitioning of a server into as many VMs as you desire.

Server Costs vs Virtual Server Costs

The Claim

I have seen it claimed that renting a virtual server can be cheaper than paying for electricity on a server you own. So I’m going to analyse this with electricity costs from Melbourne, Australia and the costs of running virtual servers in the US and Europe as these are the options available to me.

The Costs

According to my last bill I’m paying 18.25 cents per kWh – that’s a domestic rate for electricity use and businesses pay different rates. For this post I’m interested in SOHO and hobbyist use so business rates aren’t relevant. I’ll assume that a year has 365.25 days as I really doubt that people will change their server arrangements to save some money on a leap year. A device that draws 1W of power if left on for 365.25 days will take 365.25*24/1000 = 8.766kWh which will cost 8.766*0.1825 = $1.5997950. I’ll round that off to $1.60 per Watt-year.

I’ve documented the power use of some systems that I own [1]. I’ll use the idle power use because most small servers spend so much time idling that the time that they spend doing something useful doesn’t affect the average power use. I think it’s safe to assume that someone who really wants to save money on a small server isn’t going to buy a new system so I’ll look at the older and cheaper systems. The lowest power use there is a Cobalt Qube, a 450MHz AMD K6 is really small, but at 20W when idling means a cost of only $32 per annum. My Thinkpad T41p is a powerful little system, a 1.7GHz Pentium-M with 1.5G of RAM, a 100G IDE disk and a Gig-E port should be quite useful as a server – which now that the screen is broken is a good use for it. That Thinkpad drew 23W at idle with the screen on last time I tested it which means an annual cost of $36.80 – or something a little less if I leave the screen turned off. A 1.8GHz Celeron with 3 IDE disks drew 58W when idling (but with the disks still spinning), let’s assume for the sake of discussion that a well configured system of that era would take 60W on average and cost $96 per annum.

So my cost for electricity would vary from as little as $36.80 to as much as $96 per year depending on the specs of the system I choose. That’s not considering the possibility of doing something crazy like ripping the IDE disk out of an old Thinkpad and using some spare USB flash devices for storage – I’ve been given enough USB flash devices to run a RAID array if I was really enthusiastic.

For virtual server hosting the cheapest I could find was Xen Europe charges E5 for a virtual server with 128M of RAM, 10G of storage and 1TB of data transfer [2], that is $AU7.38. The next best was Quantact who charges $US15 for a virtual server with 256M of RAM [3], that is $AU16.41.

Really for my own use if I was paying I might choose Linode [4] or Slicehost [5], they both charge $US20 ($AU21.89) for their cheapest virtual server which has 360M or 256M of RAM respectively. I’ve done a lot of things with Linode and Slicehost and had some good experiences, Xen Europe got some good reviews last time I checked but I haven’t used them.

The Conclusion

When comparing a Xen Europe virtual server at $88.56 per annum it might be slightly cheaper than running my old Celeron system – but would be more expensive than buying electricity for my old Thinkpad. If I needed more than 128M of RAM (which seems likely) then the next cheapest option is a 256M XenEurope server for $14.76 per month which is $177.12 per annum which makes my old computers look very appealing. If I needed more than a Gig of RAM then my old Thinkpad would be a clear winner, also if I needed good disk IO capacity (something that always seems poor in virtual servers) then a local server would win.

Virtual servers win when serious data transfer is needed. Even if you aren’t based in a country like Australia where data transfer quotas are small (see my previous post about why Internet access in Australia sucks [6]) you will probably find that any home Internet connection you can reasonably afford doesn’t allow the fast transfer of large quantities of data that you would desire from a server.

So I conclude that apart from strange and unusual corner cases it is cheaper in terms of ongoing expenses to run a small server in your own home than to rent a virtual server.

If you have to purchase a system to run as a server (let’s say $200 for something cheap) and assume hardware depreciation expenses (maybe another $200 every two years) then you might be able to save money. But this also seems like a corner case as the vast majority of people who have the skills to run such servers also have plenty of old hardware, they replace their main desktop systems periodically and often receive gifts of old hardware.

One final fact that is worth considering is that if your time has a monetary value and if you aren’t going to learn anything useful by running your own local server then using a managed virtual server such as those provided by Linode (who have a really good management console) then you will probably save enough time to make it worth the expense.

Xen and Debian/Squeeze

Ben Hutchings announced that the Debian kernel team are now building Xen flavoured kernels for Debian/Unstable [1]. Thanks to Max Attems and the rest of the kernel team for this and all their other great work! Thanks Ben for announcing it. The same release included OpenVZ, updated DRM, and the kernel mode part of Nouveau – but Xen is what interests me most.

I’ve upgraded the Xen server that I use for my SE Linux Play Machine [2] to test this out.

To get this working you first need to remove xen-tools as the Testing version of bash-completion has an undeclared conflict, see Debian bug report #550590.

Then you need to upgrade to Unstable, this requires upgrading the kernel first as udev won’t upgrade without it.

If you have an existing system you need to install xen-hypervisor-3.4-i386 and purge xen-hypervisor-3.2-1-i386 as the older Xen hypervisor won’t boot the newer kernel. This also requires installing xen-utils-3.4 and removing xen-utils-3.2-1 as the utilities have to match the kernel. You don’t strictly need to remove the old hypervisor and utils packages as it should be possible to have dual-boot configured with old and new versions of Xen and matching Linux kernels. But this would be painful to manage as update-grub doesn’t know how to match Xen and Linux kernel versions so you will get Grub entries that are not bootable – it’s best to just do a clean break and keep a non-Xen version of the older kernel installed in case it doesn’t initially boot.

A apt-get dist-upgrade operation will result in installing the grub-pc package. The update-grub2 command doesn’t generate Xen entries. I’ve filed Debian bug report #574666 about this.

Because the Linux kernel doesn’t want to reduce in size to low values I use “xenhopt=dom0_mem=142000” in my GRUB 0.98 configuration so that the kernel doesn’t allocate as much RAM to it’s internal data structures. In the past I’ve encountered a kernel memory management bug related to significantly reducing the size of the Dom0 memory after boot [3].

Before I upgraded I had the dom0_mem size set to 122880 but when running Testing that seems to get me a kernel Out Of Memory condition from udev in the early stages of boot which prevents LVM volumes from being scanned and therefore prevents swap from being enabled so the system doesn’t work correctly (if at all). I had this problem with 138000M of RAM so I chose 142000 as a safe number. Now I admit that the system would probably boot with less RAM if I disabled SE Linux, but the SE Linux policy size of the configuration I’m using in the Dom0 has dropped from 692K to 619K so it seems likely that the increase in required memory is not caused by SE Linux.

The Xen Dom0 support on i386 in Debian/Unstable seems to work quite well. I wouldn’t recommend it for any serious use, but for something that’s inherently designed for testing (such as a SE Linux Play Machine) then it works well. My Play Machine has been offline for the last few days while I’ve been working on it. It didn’t take much time to get Xen working, it took a bit of time to get the SE Linux policy for Unstable working well enough to run Xen utilities in enforcing mode, and it took three days because I had to take time off to work on other projects.

Starting with KVM

I’ve just bought a new Thinkpad that has hardware virtualisation support and I’ve got KVM running.

HugePages

The Linux-KVM site has some information on using hugetlbfs to allow the use of 2MB pages for KVM [1]. I put “vm.nr_hugepages = 1024” in /etc/sysctl.conf to reserve 2G of RAM for KVM use. The web page notes that it may be impossible to allocate enough pages if you set it some time after boot (the kernel can allocate memory that can’t be paged and it’s possible for RAM to become too fragmented to allow allocation). As a test I reduced my allocation to 296 pages and then increased it again to 1024, I was surprised to note that my system ran extremely slow while reserving the pages – it seems that allocating such pages is efficient when done at boot time but not so efficient when done later.

hugetlbfs /hugepages hugetlbfs mode=1770,gid=121 0 0

I put the above line in /etc/fstab to mount the hugetlbfs filesystem. The mode of 1770 allows anyone in the group to create files but not unlink or rename each other’s files. The gid of 121 is for the kvm group.

I’m not sure how hugepages are used, they aren’t used in the most obvious way. I expected that allocating 1024 huge pages would allow allocating 2G of RAM to the virtual machine, that’s not the case as “-m 2048” caused kvm to fail. I also expected that the number of HugePages free according to /proc/meminfo would reliably drop by an amount that approximately matches the size of the virtual machine – which doesn’t seem to be the case.

I have no idea why KVM with Hugepages would be significantly slower for user and system CPU time but still slightly faster for the overall build time (see the performance section below). I’ve been unable to find any documents explaining in which situations huge pages provide advantages and disadvantages or how they work with KVM virtualisation – the virtual machine allocates memory in 4K pages so how does that work with 2M pages provided to it by the OS?

But Hugepages does provide a slight benefit in performance and if you have plenty of RAM (I have 5G and can afford to buy more if I need it) you should just install it as soon as you start.

I have filed Debian bug report #574073 about KVM displaying an error you normally can’t see when it can’t access the hugepages filesystem [6].

Permissions

open /dev/kvm: Permission denied
Could not initialize KVM, will disable KVM support

One thing that annoyed me about KVM is that the Debian/Lenny version will run QEMU instead if it can’t run KVM. I discovered this when a routine rebuild of the SE Linux Policy packages in a Debian/Unstable virtual machine took an unreasonable amount of time. When I halted the virtual machine I noticed that it had displayed the above message on stderr before changing into curses mode (I’m not sure the correct term for this) such that the message was obscured until the xterm was returned to the non-curses mode at program exit. I had to add the user in question to the kvm group. I’ve filed Debian bug report #574063 about this [2].

Performance

Below is a table showing the time taken for building the SE Linux reference policy on Debian/Unstable. It compares running QEMU emulation (using the kvm command but without permission to access /dev/kvm), KVM with and without hugepages, Xen, and a chroot. Xen is run on an Opteron 1212 Dell server system with 2*1TB SATA disks in a RAID-1 while the KVM/QEMU tests are on an Intel T7500 CPU in a Thinkpad T61 with a 100G SATA disk [4]. All virtual machines had 512M of RAM and 2 CPU cores. The Opteron 1212 system is running Debian/Lenny and the Thinkpad is running Debian/Lenny with a 2.6.32 kernel from Testing.

Elapsed User System
QEMU on Opteron 1212 with Xen installed 126m54 39m36 8m1
QEMU on T7500 95m42 42m57 8m29
KVM on Opteron 1212 7m54 4m47 2m26
Xen on Opteron 1212 6m54 3m5 1m5
KVM on T7500 6m3 2m3 1m9
KVM Hugepages on T7500 with NCurses console 5m58 3m32 2m16
KVM Hugepages on T7500 5m50 3m31 1m54
KVM Hugepages on T7500 with 1800M of RAM 5m39 3m30 1m48
KVM Hugepages on T7500 with 1800M and file output 5m7 3m28 1m38
Chroot on T7500 3m43 3m11 29

I was surprised to see how inefficient it is when compared with a chroot on the same hardware. It seems that the system time is the issue. Most of the tests were done with 512M of RAM for the virtual machine, I tried 1800M which improved performance slightly (less IO means less context switches to access the real block device) and redirecting the output of dpkg-buildpackage to /tmp/out and /tmp/err reduced the built time by 32 seconds – it seems that the context switches for networking or console output really hurt performance. But for the default build it seems that it will take about 50% longer in a virtual machine than in a chroot, this is bearable for the things I do (of which building the SE Linux policy is the most time consuming), but if I was to start compiling KDE then I would be compelled to use a chroot.

I was also surprised to see how slow it was when compared to Xen, for the tests on the Opteron 1212 system I used a later version of KVM (qemu-kvm 0.11.0+dfsg-1~bpo50+1 from Debian/Unstable) but could only use 2.6.26 as the virtualised kernel (the Debian 2.6.32 kernels gave a kernel Oops on boot). I doubt that the lower kernel version is responsible for any significant portion of the extra minute of build time.

Storage

One way of managing storage for a virtual machine is to use files on a large filesystem for it’s block devices, this can work OK if you use a filesystem that is well designed for large files (such as XKS). I prefer to use LVM, one thing I have not yet discovered is how to make udev assign the KVM group to all devices that match /dev/V0/kvm-*.

Startup

KVM seems to be basically designed to run from a session, unlike Xen which can be started with “xm create” and then run in the background until you feel like running “xm console” to gain access to the console. One way of dealing with this is to use screen. The command “screen -S kvm-foo -d -m kvm WHATEVER” will start a screen session named kvm-foo that will be detached and will start by running kvm with “WHATEVER” as the command-line options. When screen is used for managing virtual machines you can use the command “screen -ls” to list the running sessions and then commands such as “screen -r kvm-unstable” to reattach to screen sessions. To detach from a running screen session you type ^A^D.

The problem with this is that screen will exit when the process ends and that loses the shutdown messages from the virtual machine. To solve this you can put “exec bash” or “sleep 200” at the end of the script that runs kvm.

start-stop-daemon -S -c USERNAME --exec /usr/bin/screen -- -S kvm-unstable -d -m /usr/local/sbin/kvm-unstable

On a Debian system the above command in a system boot script (maybe /etc/rc.local) could be used to start a KVM virtual machine on boot. In this example USERNAME would be replaced by the name of the account used to run kvm, and /usr/local/sbin/kvm-unstable is a shell script to run kvm with the correct parameters. Then as user USERNAME you can attach to the session later with the command “screen -x kvm-unstable“. Thanks to Jason White for the tip on using screen.

I’ve filed Debian bug report #574069 [3] requesting that kvm change it’s argv[0] so that top(1) and similar programs can be used to distinguish different virtual machines. Currently when you have a few entries named kvm in top’s output it is annoying to match the CPU hogging process to the virtual machine it’s running.

It is possible to use KVM with X or VNC for a graphical display by the virtual machine. I don’t like these options, I believe that Xephyr provides better isolation, I’ve previously documented how to use Xephyr [5].

kvm -kernel /boot/vmlinuz-2.6.32-2-amd64 -initrd /boot/initrd.img-2.6.32-2-amd64 -hda /dev/V0/unstable -hdb /dev/V0/unstable-swap -m 512 -mem-path /hugepages -append "selinux=1 audit=1 root=/dev/hda ro rootfstype=ext4" -smp 2 -curses -redir tcp:2022::22

The above is the current kvm command-line that I’m using for my Debian/Unstable test environment.

Networking

I’m using KVM options such as “-redir tcp:2022::22” to redirect unprivileged ports (in this case 2022) to the ssh port. This works for a basic test virtual machine but is not suitable for production use. I want to run virtual machines with minimal access to the environment, this means not starting them as root.

One thing I haven’t yet investigated is the vde2 networking system which allows a private virtual network over multiple physical hosts and which should allow kvm to be run without root privs. It seems that all the other networking options for kvm which have appealing feature sets require that the kvm process be started with root privs.

Is KVM worth using?

It seems that KVM is significantly slower than a chroot, so for a basic build environment a secure chroot environment would probably be a better option. I had hoped that KVM would be more reliable than Xen which would offset the performance loss – however as KVM and Debian kernel 2.6.32 don’t work together on my Opteron system it seems that I will have some reliability issues with KVM that compare with the Xen issues. There are currently no Xen kernels in Debian/Testing so KVM is usable now with the latest bleeding edge stuff (on my Thinkpad at least) while Xen isn’t.

Qemu is really slow, so Xen is the only option for 32bit hardware. Therefore all my 32bit Xen servers need to keep running Xen.

I don’t plan to switch my 64bit production servers to KVM any time soon. When Debian/Squeeze is released I will consider whether to use KVM or Xen after upgrading my 64bit Debian server. I probably won’t upgrade my 64bit RHEL-5 server any time soon – maybe when RHEL-7 is released. My 64bit Debian test and development server will probably end up running KVM very soon, I need to upgrade the kernel for Ext4 support and that makes KVM more desirable.

So it seems that for me KVM is only going to be seriously used on my laptop for a while.

Generally I am disappointed with KVM. I had hoped that it would give almost the performance of Xen (admittedly it was only 14.5% slower). I had also hoped that it would be really reliable and work with the latest kernels (unlike Xen) but it is giving me problems with 2.6.32 on Opteron. Also it has some new issues such as deciding to quietly do something I don’t want when it’s unable to do what I want it to do.

Virtual Hosting Features

I’ve just been setting up new virtual servers at Linode [1] and Slicehost [2]. I have previously written a review of both those services [3], based on that review (and some other discussions) one of my clients now has a policy of setting up pairs of virtual servers for various projects, one server at Linode and one at Slicehost.

Now both virtual hosting providers work very well and I’m generally happy with both of them.

But Linode seems to be a better offering.

Linode has graphs of various types of usage, I can look at graphs of disk IO, CPU use, and network IO for the last 24 hours, 30 days, or for previous months. The three graphs have the same scale of the X axis so I can correlate them. The stats on Slicehost just allow you to get the current raw numbers, which doesn’t help if I want to know what happened last night when performance sucked.

When I build a Linode instance I can have multiple filesystems configured (Slicehost can’t do any of this). I can use less disk space than is available to reserve space for other filesystems. Separating filesystems makes it easier to track IO performance and also allows some bounds to be set on the amount of disk space used for various tasks. Nowadays the use of multiple partitions is not as popular as it once was, but it’s still a real benefit. Of course one of the benefits of this is that I can have two partitions on Linode that are suitable for running as the root filesystem. If an upgrade fails then it would be an option to boot with the other filesystem (I haven’t actually done this but it’s good to have the option).

I believe that this feature of Linode could do with some improvements. Firstly when creating or resizing filesystem it should be possible to specify the number of Inodes when using Ext3. The fsck time for a large Ext3 filesystem that has the default number of Inodes is quite unreasonable. It would also be good if other filesystems such as XFS were supported, for some use cases XFS can significantly outperform Ext3 – and choice is always good. When BTRFS becomes stable I expect that every hosting provider will be compelled to support it (any provider that wants my continued business will do so).

Now Linode and Slicehost both allow sharing bandwidth allowances between virtual servers. So if you run one server that uses little bandwidth you can run a second server that needs a lot of bandwidth and reduce the potential for excess bandwidth use problems. The next logical extension to this is to allow sharing disk allocation between servers on the same physical system. So for example I might want to run a web server for the purpose of sending out large files, 360M of RAM as provided by the Linode 360 offering would be plenty. But the 128G of storage and 1600GB per month of bandwidth usage that is provided with the Linode 2880 plan would be really useful. At the same time a system that does computationally expensive tasks (such as a build and test server) might require a large amount of RAM such as 2880MB while requiring little disk space or bandwidth. Currently Linode allows sharing bandwidth arbitrarily between the various servers but not disk space or RAM. I don’t think that this would be a really difficult feature to implement.

Finally Linode has a “Pending Jobs Queue” that shows the last few requests to the management system and their status. It’s not really necessary, but it is handy to see what has been done and it gives the sysadmin a feeling of control over the process.

These management features provide enough value to me that if I was going to use a single virtual hosting provider then I would choose Linode. For certain reliability requirements it simply wouldn’t be a responsible decision to trust any single hosting company. In that case I’m happy to recommend both Linode and Slicehost as providers.

New Servers – a non-virtual Cloud

NewServers.com [1] provides an interesting service. They have a cloud computing system that is roughly comparable to Amazon EC2, but for which all servers are physical machines (blade servers with real disks). This means that you get the option of changing between servers and starting more servers at will, but they are all physical systems so you know that your system is not going to go slow because someone else is running a batch job.

New Servers also has a bandwidth limit of 3GB per hour with $0.10 per GB if you transfer more than that. Most people should find that 3GB/hour is enough for a single server. This compares to EC2 where you pay $0.10 per GB to receive data and $0.17 to transmit it. If you actually need to transmit 2100GB per month then the data transfer fees from EC2 would be greater than the costs of renting a server from New Servers.

When running Linux the EC2 hourly charges are (where 1ECU is provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor):

NAME Cost Desc
Small $0.10 1.7G, 160G, 32bit, 1ECU, 1core
Large $0.20 7.5G, 850G, 64bit, 4ECU, 2core
Extra Large $0.40 15G, 1690G, 64bit, 8ECU, 4core
High CPU Medium $0.20 1.7G, 350G, 32bit, 5ECU, 2core
High CPU Extra Large $0.80 7G, 1690G, 64bit, 20ECU, 5core

The New Servers charges are:

NAME Cost Desc
Small $0.11 1G, 36G, 32bit, Xeon 2.8GHz
Medium $0.17 2G, 2*73G, 32bit, 2*Xeon 3.2GHz
Large $0.25 4G, 250G, 64bit, E5405 Quad Core 2Ghz
Jumbo $0.38 8G, 2*500G, 64bit, 2 x E5405 Quad Core 2Ghz
Fast $0.53 4G, 2*300G, 64bit, E5450 Quad Core 3Ghz

The New Servers prices seem quite competitive with the Amazon prices. One down-side to New Servers is that you have to manage your own RAID, the cheaper servers have only a single disk (bad luck if it fails). The better ones have two disks and you could setup your own RAID. Of course the upside of this is that if you want a fast server from New Servers and you don’t need redundancy then you have the option of RAID-0 for better performance.

Also I don’t think that there is anything stopping you from running Xen on a New Servers system. So you could have a bunch of Xen images and a varying pool of Dom0s to run them on. If you were to choose the “Jumbo” option with 8G of RAM and share it among some friends with everyone getting a 512M or 1G DomU then the cost per user would be a little better than Slicehost or Linode while giving better management options. One problem I sometimes have with virtual servers for my clients is that the disk IO performance is poorer than I expect. When running the server that hosts my blog (which is shared with some friends) I know the performance requirements of all DomUs and can diagnose problems quickly. I can deal with a limit on the hardware capacity, I can deal with trading off my needs with the needs of my friends. But having a server just go slow, not knowing why, and having the hosting company say “I can move you to a different physical server” (which may be better or worse) doesn’t make me happy.

I first heard about New Servers from Tom Fifield’s LUV talk about using EC2 as a cluster for high energy physics [2]. According to the detailed analysis Tom presented using EC2 systems on demand can compete well with the costs of buying Dell servers and managing them yourself, EC2 wins if you have to pay Japanese prices for electricity but if you get cheap electricity then Dell may win. Of course a major factor is the amount of time that the servers are used, a cluster that is used for short periods of time with long breaks in between will have a higher cost per used CPU hour and thus make EC2 a better option.

Dom0 Memory Allocation and X

I’ve previously written about memory squeeze problems in a Xen Dom0 when large amounts of memory were assigned to DomUs [1]. In summary the Dom0 would have problems if started with default options and the majority of the RAM was later assigned to DomUs, but if the memory of the Dom0 was limited by the dom0_mem parameter to the Xen kernel then things would work well.

Fatal server error:
xf86MapVidMem: Could not mmap framebuffer (0xfae80000,0x80000) (Invalid argument)

I have since found another exciting bug with Xen. I was in the process of upgrading an AMD64 workstation to using Xen so that I could test other versions of some software in the background. The first stage was to install the Xen kernel and the Xen enabled Linux kernel and boot the machine. Unfortunately I then received the above message when trying to start the X server. I discovered the solution to this in the comments section of Different Colours blog post about Virtualisation on Lenny [2]. It seems that there is a problem gaining mmap access to MMIO memory regions in Xen and that restricting the memory of the Dom0 is a work-around.

My AMD64 workstation has 3G of RAM because the motherboard can’t support more than 3.25G and buying 4G of RAM to have 3.25G usable would be an expensive way of doing it. So I used dom0_mem=1500M and now X works again. I have yet to discover if anything strange and exciting happens when I create DomUs on the machine. I don’t have any immediate plans for running Xen on the machine. It’s main uses at the moment is torcs (a slightly realistic 3D car racing game), supertuxkart (a cartoon 3D car racing game), and mplayer so it doesn’t really need a lot of RAM.

I like to keep my options open and have all my machines capable of virtualisation apart from routers.

Red Hat, Microsoft, and Virtualisation Support

Red Hat has just announced a deal with MS for support of RHEL virtual machines on Windows Server and Windows virtual machines on RHEL [1]. It seems that this deal won’t deliver anything before “calendar H2 2009” so nothing will immediately happen – but the amount of testing to get these things working correctly is significant.

Red Hat has stated that “the agreements contain no patent or open source licensing components” and “the agreements contain no financial clauses, other than industry-standard certification/validation testing fees” so it seems that there is nothing controversial in this. Of course that hasn’t stopped some people from getting worked up about it.

I think that this deal is a good thing. I have some clients who run CentOS and RHEL servers (that I installed and manage) as well as some Windows servers. Some of these clients have made decisions about the Windows servers that concern me (such as not using ECC RAM, RAID, or backups). It seems to me that if I was to use slightly more powerful hardware for the Linux servers I could run Windows virtual machines for those clients, manage all the backups at the block device level (without bothering the Windows sysadmins). This also has the potential to save the client some costs in terms of purchasing hardware and managing it.

When this deal with MS produces some results (maybe in 6 months time) I will recommend that some of my clients convert CentOS machines to RHEL to take advantage of it. If my clients take my advice in this regard then it will result in a small increase in revenue and market share for RHEL. So Red Hat’s action in this regard seems to be a good business decision for them. If my clients take my advice and allow me to use virtualisation to better protect their critical data that is on Windows servers then it will be a significant benefit for the users.

Xen and Lenny

Debian GNU/Linux 5.0 AKA “Lenny” has just been released [1].

One of the features that is particularly noteworthy is that Xen has been updated and now works fully and correctly on the 2.6.26 kernel (see the Debian Wiki page about Xen for details [2]). This may not sound exciting, but I know that a lot of people put a lot of work into getting this going, and for a long time in Unstable it wasn’t working well. I’ve just upgraded three Xen servers from Etch to Lenny (actually one was Etch kernel with Lenny user-space), and they all worked!

Those three servers were all running the i386 architecture, the next thing to do is to try it out with the AMD64 architecture. One of my plans is to try the latest Debian kernel on the server I use in Germany, but I’ll try on a few other AMD64 machines first.