Linux, politics, and other interesting things
In a comment on my post about Slicehost, Linode, and scaling up servers  it was suggested that there is no real difference between a physical server and a set of slices of a virtual server that takes up all the resources of the machine.
The commentator notes that it’s easier to manage a virtual machine. When you have a physical machine running at an ISP server room there are many things that need to be monitored, this includes the temperature at various points inside the case and the operation of various parts (fans and hard disks being two obvious ones). When you run the physical server you have to keep such software running (you maintain the base OS). If the ISP owns the server (which is what you need if the server is in another country) then the ISP staff are the main people to review the output. Having to maintain software that provides data for other people is a standard part of a sys-admin’s job, but when that data determines whether the server will die it is easier if one person manages it all. If you have a Xen DomU that uses all the resources of the machine (well all but the small portion used by the Dom0 and the hypervisor) then a failing hard disk could simply be replaced by the ISP staff who would notify you of the expected duration of the RAID rebuild (which would degrade performance). For more serious failures the data could be migrated to another machine, in the case of predicted failures (such as unexpected temperature increases or the failure of a cooling fan) it is possible to migrate a running Xen DomU to another server. If the server migration is handled well then this can be a significant benefit of virtualisation for an ISP customer. Also Xen apparently supports having RAM for a DomU balloon out to a larger size than was used on boot, I haven’t tested this feature and don’t know how well it works. If it supports ballooning to something larger than the physical size in the original server then it would be possible to migrate a running instance to a machine with more RAM to upgrade it.
The question is whether it’s worth the cost. Applications which need exactly the resources of one physical server seem pretty rare to me. Applications which need resources that are considerably smaller than a single modern server are very common, and applications which have to be distributed among multiple servers are not that common (although many of us hope that our projects will become so successful ;). So the question of whether it’s worth the cost is often really whether the overhead of virtualisation will make a single large machine image take more resources than a single server can provide (moving from a single server to multiple servers costs a lot of developer time, and moving to a larger single server exponentially increases the price). There is also an issue of latency, all IO operations can be expected to take slightly longer so even if the CPU is at 10% load and there is a lot of free RAM some client operations will still take longer, but I hope that it wouldn’t be enough to compete with the latency of the Internet – even a hard drive seek is faster than the round trip times I expect for IP packets from most customer machines.
VMware has published an interesting benchmark of VMware vs Xen vs native hardware . It appears to have been written in February 2007 and while it’s intent is to show VMware as being better than Xen, in most cases it seems to show them both as being good enough. The tests involved virtualising 32bit Windows systems, this doesn’t seem an unreasonable test as many ISPs are offering 32bit virtual machines as 32bit code tends to use less RAM. One unfortunate thing is that they make no explanation of why “Intger Math” might run at just over 80% native performance on VMware and just under 60% native performance on Xen. The other test results seem to show that for a virtualised Windows OS either VMware or Xen will deliver enough performance (apart from the ones where VMware claims that Xen provides only a tiny fraction of native performance – that’s a misconfiguration that is best ignored). Here is an analysis of the VMware benchmark and the XenSource response (which has disappeared from the net) .
As it seems that in every case we can expect more than 90% native performance from a single DomU and as the case of needing more than 90% native performance is rare it seems that there is no real difference that we should care about when running servers and that the ease of management outweighs the small performance benefit from using native hardware.
Now it appears that Slicehost  caters to people who desire this type of management. Their virtual server plans have RAM going in all powers of two from 256M to 8G, and then they have 15.5G – which seems to imply that they are using physical servers with 16G of RAM and that 15.5G is all that is left after the Xen hypervisor and the Dom0 have taken some. One possible disadvantage of this is that if you want all the CPU power of a server but not so much RAM (or the other way around) then the Slicehost 15.5G plan might involve more hardware being assigned to you than you really need. But given the economies of scale involved in purchasing and managing the large number of servers that Slicehost is running it might cost them more to run a machine with 8G of RAM as a special order than to buy their standard 16G machine.
Other virtual hosting companies such as Gandi and Linode clearly describe that they don’t support a single instance taking all the resources of the machine (1/4 and 1/5 of a machine respectively are the maximums). I wonder if they are limiting the size of virtual machines to avoid the possibility of needing to shuffle virtual machines when migrating a running virtual machine.
One significant benefit of having a physical machine over renting a collection of DomUs is the ability to run virtual machines as you desire. I prefer to have a set of DomUs on the same physical server so that if one DomU is running slowly then I have the option to optimise other DomUs to free up some capacity. I can change the amounts of RAM and the number of virtual CPUs allocated to each DomU as needed. I am not aware of anyone giving me the option to rent all the capacity of a single server in the form of managed DomUs and then assign the amounts of RAM, disk, and CPU capacity to them as I wish. If Slicehost offered such a deal then one of my clients would probably rent a Slicehost server for this purpose as soon as their current contract runs out.
It seems that there is a lot of potential to provide significant new features for virtual hosting. I expect that someone will start offering these things in the near future. I will advise my clients to try and avoid signing any long-term contracts (where long means one year in the context of hosting) so that they keep their options open for future offers.