CPU Capacity for Virtualisation

Today a client asked me to advise him on how to dramatically reduce the number of servers for his business. He needs to go from 18 active servers to 4. Some of the machines in the network are redundant servers. By reducing some of the redundancy I can remove four servers, so now it’s a need to go from 14 to 4.

To determine the hardware requirements I analyzed the sar output from all machines. The last 10 days of data were available, so I took the highest daily average numbers from each machine for user and system CPU load and added them up, the result was 221%. So for the average daily CPU use three servers would have enough power to run the entire network. Then I looked at the highest 5 minute averages for user and system CPU load from each machine which add up to 582%. So if all machines were to have their peak usage times simultaneously (which doesn’t happen) then the CPU power of six machines would be needed. I conclude that the CPU power requirements are somewhere between 3 and 6 machines, so 4 machines may do an OK job.

The next issue is IO capacity. The current network has 2G of RAM in each machine and I plan to run it all on 4G Xen servers, so it’s a total of 16G of RAM instead of 36G. While some machines currently have unused memory I expect that the end result of this decrease in total RAM will be more cache misses and more swapping so the total IO capacity use will increase slightly. Now four of the servers (which will eventually become Xen Dom0’s) have significant IO capacity (large RAIDs – they appear to have 10*72G disks in a RAID-5) and the rest have a smaller IO capacity (they appear to have 4*72G disks in a RAID-10). The other 14 machines have the highest daily averages for iowait adding up to 9% and the highest 5 minute averages adding up to 105%. I hope that spreading that 105% of the IO capacity of a 4 DISK RAID-10 across four sets of 10 disk RAID-5’s won’t give overly bad performance.

I am concerned that there may be some flaw in the methodology that I am using to estimate capacity. One issue is that I’m very doubtful about the utility of measuring iowait, one issue is that iowait is the amount of IDLE CPU time when there are processes blocked on IO. So if for example you have 100% CPU time being used then iowait will be zero regardless of how much disk IO is in progress! One check that I performed was to add the maximum CPU time used, the maximum iowait, and the minimum IDLE time. Most machines gave totals that were very close to 100% when those columns were added, so it seems that if the maximum iowait for a 5 minute period plus the maximum CPU use plus the minimum idle time add up to 100% and the minimum idle time was not very low then it seems unlikely that there was any significant overlap between disk IO and CPU use to hide iowait. One machine had a total of 147% for those fields in the 5 minute average which suggests that the IO load may be higher than the 66% iowait number may indicate. But if I put that in a DomU on the machine with the most unused IO capacity then it should be OK.

I will be interested to read any suggestions for how to proceed with this. But unfortunately it will probably be impossible to consider any suggestion which involves extra hardware or abandoning the plan due to excessive risk…

I will write about the results.

10 comments to CPU Capacity for Virtualisation

  • Thomas

    i would also caculate/consider the following:
    – loss of cpu perfomance with virtualization (i would say between 1 and 10 %)
    – network traffic rate between the hosts as you won’t get the same throughput with virtualized NIC’s

    And finaly:
    – if you run many virtual hosts on one server and the physical host goes down, all the virtual hosts are also gone. Just to remember. :)

  • Olaf van der Spek

    > and I plan to run it all on 4G Xen servers,

    Why only 4 gbyte? RAM is cheap and 2 gbyte modules are cost-effective, so 8 gbyte in a single-socket system with just four RAM slots shouldn’t be a problem.

    > One issue is that I’m very doubtful about the utility of measuring iowait,

    Why don’t you look at disk/storage utilization directly?

  • Gabor Gombas

    How do you define “I/O capacity”? Beware that there are I/O patterns that a 4-disk RAID10 may handle just fine but that may bring a 10-disk RAID5 to its knees.

    For example, if the I/O pattern involves a lot of small (<< stripe size) random writes, then for every write, the RAID5 must read the chunks from _all_ 10 disks to be able to re-calculate the parity. And that hurts a lot.

  • Rob Wilderspin

    Hi Russell,

    As Olaf hinted at, to get a better feel for the IO requirements you need to look at the output of ‘sar -b’ and ‘sar -d’, rather than the CPU iowait percentage.


  • Requirements first. Why does your client need to throw away 14 machines? I’d guess he’s trying to minimize one-time costs (new hardware) and recurring costs (rack space, machine+cooling power, licenses and support, and manpower).

    Even 582% CPU usage wouldn’t require six machines. If you buy newer (faster and multi-core) CPUs, one machine might suffice, and two might give 100% redundancy. But first check VM overhead as Thomas suggested, and use oprofile see how much “CPU time” is spent waiting on CPU vs. main memory.

    Find out if your client’s disk IO bottleneck is seeks or throughput as Olaf and Rob suggested. Assuming seeks,

    * choose your array configuration wisely to reduce seeks required from each disk. Consider more, smaller arrays; larger stripe sizes; and ditching RAID-5. If that takes away too much storage capacity, consider replacing the disks with newer (larger and faster) ones.

    * eliminate disk IO with memory. Don’t ever swap, and ensure you have adequate cache either in main memory or on a battery-backed write cache on the array controllers.

    * eliminate seeks in software. Consider features like PostgreSQL’s asynchronous commit which batches writes at the expense of losing recent changes on power failure (and check your UPS to ensure power failures are unlikely).

  • AlphaG

    I am not always sure “hardware reduction” does any more than just that. At present yes they have 18 physical machines, once you have completed the hardware reduction they will have actually increased the number of servers they will be requried to manage (ESX based thinking of new hosts plus the 18). I believe hardware consolidation is only part of the overall update you need to contemplate. What about database consolidation from a few to one, web servers from many to one etc etc. I think you also need to consider a virtual platform, operating system and applciation monitoring tool that will provide significant proactive monitoring to plan for upgrades, performance history and failover capabilities between the virtual hosts.

    Not telling you how to suck eggs but I have seen so many of these consolidations do one specific thing but actually complicate life for the admin just a little more.

  • etbe

    Thomas: Yes there will be a loss. But currently the machines in question are each running management software that uses more than 30 minutes of CPU time per day. I suspect that the decrease in CPU time used by not running that software will make up for the overhead of Xen.

    Olaf: RAM is cheap for new desktop machines. For old server machines it’s not so cheap.

    Rob: Thanks for that. It’s a pity that sar doesn’t store the data for partitions. “sar -b” is very interesting, but unfortunately “sar -d” attributes all IO to dev104-0 on the systems in question.

    Scott: The machines need to go to save costs from the ISP that hosts them. Newer machines would make things easier, but it would require moving some significant amounts of data which are on the old machines – and also cost more money.

    Why would smaller arrays reduce the number of seeks?

    Larger and faster disks would of course require more money, as would significant new software developments to try and optimise the software.

    AlphaG: I agree. Part of the plan is to reduce redundant servers. So instead of 18 machines there will be 14 virtual machines on four physical hosts, and after that maybe more reductions in the number of virtual machines.

  • > Scott: The machines need to go to save costs from the ISP that hosts them. Newer machines would make things easier, but it would require moving some significant amounts of data which are on the old machines – and also cost more money.

    Sure. I don’t like to assume anything, though, and that includes the client’s stated requirements being the best way to accomplish his real requirements. So I wouldn’t rule out new hardware without running the numbers. (Can running two machines instead of four give you a bit of cash for new hardware?)

    > Why would smaller arrays reduce the number of seeks?

    Smaller arrays bound the number of disks which have to seek to complete a single IO operation. If you have an IO operation larger than stripe*disks, every disk in the array will have to seek to complete that one operation. If your stripe size is 64 KiB (the default for Linux’s software RAID), almost every read or write will involve more than one disk, and it’s easy to imagine many of them involving all 10. At the other extreme (no RAID), every IO operation requires only one disk to seek. But of course, unless you have a non-RAID way of balancing the IO load and keeping redundant copies of data (i.e., GFS), that’s going too far.

    If I were to guess performance specs for your drive, I’d say 40 MB/s sustained with 8 ms average latency. Putting them into a ten-drive RAID-5 with way-too-small stripes gives you 360 MB/s, but that basically only happens when you’re in single-user mode copying data to the disk for the first time. More realistically, you can’t quite do 125 tiny operations per second. That’s no better than a single drive, so arguably you’re wasting 90% of your IO capacity.

    Really what you’d want is a histogram of the number of 64/128/256/1024/2048/4096-KiB stripes involved for each IO operation. If you could say with confidence “with my usage and 1 MiB stripes, 95% of operations would involve 1 disk; only .01% would involve 10 disks”, it’d be trivial to pick an optimal array configuration. Unfortunately I don’t know how to get that information. Unless maybe SystemTap could tell you…hmm…I might give that a shot, as I’m suspicious I screwed up the array on one of my own machines.

  • > Unless maybe SystemTap could tell you…hmm…I might give that a shot, as I’m suspicious I screwed up the array on one of my own machines.

    Hey, that actually works! Here’s a first pass at such a script:

  • I put a more specialized version of that script up on the SystemTap wiki at It seems to do a decent job.