I’ve just bought a new Thinkpad that has hardware virtualisation support and I’ve got KVM running.
Table of Contents
HugePages
The Linux-KVM site has some information on using hugetlbfs to allow the use of 2MB pages for KVM [1]. I put “vm.nr_hugepages = 1024” in /etc/sysctl.conf to reserve 2G of RAM for KVM use. The web page notes that it may be impossible to allocate enough pages if you set it some time after boot (the kernel can allocate memory that can’t be paged and it’s possible for RAM to become too fragmented to allow allocation). As a test I reduced my allocation to 296 pages and then increased it again to 1024, I was surprised to note that my system ran extremely slow while reserving the pages – it seems that allocating such pages is efficient when done at boot time but not so efficient when done later.
hugetlbfs /hugepages hugetlbfs mode=1770,gid=121 0 0
I put the above line in /etc/fstab to mount the hugetlbfs filesystem. The mode of 1770 allows anyone in the group to create files but not unlink or rename each other’s files. The gid of 121 is for the kvm group.
I’m not sure how hugepages are used, they aren’t used in the most obvious way. I expected that allocating 1024 huge pages would allow allocating 2G of RAM to the virtual machine, that’s not the case as “-m 2048” caused kvm to fail. I also expected that the number of HugePages free according to /proc/meminfo would reliably drop by an amount that approximately matches the size of the virtual machine – which doesn’t seem to be the case.
I have no idea why KVM with Hugepages would be significantly slower for user and system CPU time but still slightly faster for the overall build time (see the performance section below). I’ve been unable to find any documents explaining in which situations huge pages provide advantages and disadvantages or how they work with KVM virtualisation – the virtual machine allocates memory in 4K pages so how does that work with 2M pages provided to it by the OS?
But Hugepages does provide a slight benefit in performance and if you have plenty of RAM (I have 5G and can afford to buy more if I need it) you should just install it as soon as you start.
Permissions
open /dev/kvm: Permission denied
Could not initialize KVM, will disable KVM support
One thing that annoyed me about KVM is that the Debian/Lenny version will run QEMU instead if it can’t run KVM. I discovered this when a routine rebuild of the SE Linux Policy packages in a Debian/Unstable virtual machine took an unreasonable amount of time. When I halted the virtual machine I noticed that it had displayed the above message on stderr before changing into curses mode (I’m not sure the correct term for this) such that the message was obscured until the xterm was returned to the non-curses mode at program exit. I had to add the user in question to the kvm group. I’ve filed Debian bug report #574063 about this [2].
Performance
Below is a table showing the time taken for building the SE Linux reference policy on Debian/Unstable. It compares running QEMU emulation (using the kvm command but without permission to access /dev/kvm), KVM with and without hugepages, Xen, and a chroot. Xen is run on an Opteron 1212 Dell server system with 2*1TB SATA disks in a RAID-1 while the KVM/QEMU tests are on an Intel T7500 CPU in a Thinkpad T61 with a 100G SATA disk [4]. All virtual machines had 512M of RAM and 2 CPU cores. The Opteron 1212 system is running Debian/Lenny and the Thinkpad is running Debian/Lenny with a 2.6.32 kernel from Testing.
Elapsed | User | System | |
---|---|---|---|
QEMU on Opteron 1212 with Xen installed | 126m54 | 39m36 | 8m1 |
QEMU on T7500 | 95m42 | 42m57 | 8m29 |
KVM on Opteron 1212 | 7m54 | 4m47 | 2m26 |
Xen on Opteron 1212 | 6m54 | 3m5 | 1m5 |
KVM on T7500 | 6m3 | 2m3 | 1m9 |
KVM Hugepages on T7500 with NCurses console | 5m58 | 3m32 | 2m16 |
KVM Hugepages on T7500 | 5m50 | 3m31 | 1m54 |
KVM Hugepages on T7500 with 1800M of RAM | 5m39 | 3m30 | 1m48 |
KVM Hugepages on T7500 with 1800M and file output | 5m7 | 3m28 | 1m38 |
Chroot on T7500 | 3m43 | 3m11 | 29 |
I was surprised to see how inefficient it is when compared with a chroot on the same hardware. It seems that the system time is the issue. Most of the tests were done with 512M of RAM for the virtual machine, I tried 1800M which improved performance slightly (less IO means less context switches to access the real block device) and redirecting the output of dpkg-buildpackage to /tmp/out and /tmp/err reduced the built time by 32 seconds – it seems that the context switches for networking or console output really hurt performance. But for the default build it seems that it will take about 50% longer in a virtual machine than in a chroot, this is bearable for the things I do (of which building the SE Linux policy is the most time consuming), but if I was to start compiling KDE then I would be compelled to use a chroot.
I was also surprised to see how slow it was when compared to Xen, for the tests on the Opteron 1212 system I used a later version of KVM (qemu-kvm 0.11.0+dfsg-1~bpo50+1 from Debian/Unstable) but could only use 2.6.26 as the virtualised kernel (the Debian 2.6.32 kernels gave a kernel Oops on boot). I doubt that the lower kernel version is responsible for any significant portion of the extra minute of build time.
Storage
One way of managing storage for a virtual machine is to use files on a large filesystem for it’s block devices, this can work OK if you use a filesystem that is well designed for large files (such as XKS). I prefer to use LVM, one thing I have not yet discovered is how to make udev assign the KVM group to all devices that match /dev/V0/kvm-*.
Startup
KVM seems to be basically designed to run from a session, unlike Xen which can be started with “xm create” and then run in the background until you feel like running “xm console” to gain access to the console. One way of dealing with this is to use screen. The command “screen -S kvm-foo -d -m kvm WHATEVER” will start a screen session named kvm-foo that will be detached and will start by running kvm with “WHATEVER” as the command-line options. When screen is used for managing virtual machines you can use the command “screen -ls” to list the running sessions and then commands such as “screen -r kvm-unstable” to reattach to screen sessions. To detach from a running screen session you type ^A^D.
The problem with this is that screen will exit when the process ends and that loses the shutdown messages from the virtual machine. To solve this you can put “exec bash” or “sleep 200” at the end of the script that runs kvm.
start-stop-daemon -S -c USERNAME --exec /usr/bin/screen -- -S kvm-unstable -d -m /usr/local/sbin/kvm-unstable
On a Debian system the above command in a system boot script (maybe /etc/rc.local) could be used to start a KVM virtual machine on boot. In this example USERNAME would be replaced by the name of the account used to run kvm, and /usr/local/sbin/kvm-unstable is a shell script to run kvm with the correct parameters. Then as user USERNAME you can attach to the session later with the command “screen -x kvm-unstable“. Thanks to Jason White for the tip on using screen.
I’ve filed Debian bug report #574069 [3] requesting that kvm change it’s argv[0] so that top(1) and similar programs can be used to distinguish different virtual machines. Currently when you have a few entries named kvm in top’s output it is annoying to match the CPU hogging process to the virtual machine it’s running.
It is possible to use KVM with X or VNC for a graphical display by the virtual machine. I don’t like these options, I believe that Xephyr provides better isolation, I’ve previously documented how to use Xephyr [5].
kvm -kernel /boot/vmlinuz-2.6.32-2-amd64 -initrd /boot/initrd.img-2.6.32-2-amd64 -hda /dev/V0/unstable -hdb /dev/V0/unstable-swap -m 512 -mem-path /hugepages -append "selinux=1 audit=1 root=/dev/hda ro rootfstype=ext4" -smp 2 -curses -redir tcp:2022::22
The above is the current kvm command-line that I’m using for my Debian/Unstable test environment.
Networking
I’m using KVM options such as “-redir tcp:2022::22” to redirect unprivileged ports (in this case 2022) to the ssh port. This works for a basic test virtual machine but is not suitable for production use. I want to run virtual machines with minimal access to the environment, this means not starting them as root.
One thing I haven’t yet investigated is the vde2 networking system which allows a private virtual network over multiple physical hosts and which should allow kvm to be run without root privs. It seems that all the other networking options for kvm which have appealing feature sets require that the kvm process be started with root privs.
Is KVM worth using?
It seems that KVM is significantly slower than a chroot, so for a basic build environment a secure chroot environment would probably be a better option. I had hoped that KVM would be more reliable than Xen which would offset the performance loss – however as KVM and Debian kernel 2.6.32 don’t work together on my Opteron system it seems that I will have some reliability issues with KVM that compare with the Xen issues. There are currently no Xen kernels in Debian/Testing so KVM is usable now with the latest bleeding edge stuff (on my Thinkpad at least) while Xen isn’t.
Qemu is really slow, so Xen is the only option for 32bit hardware. Therefore all my 32bit Xen servers need to keep running Xen.
I don’t plan to switch my 64bit production servers to KVM any time soon. When Debian/Squeeze is released I will consider whether to use KVM or Xen after upgrading my 64bit Debian server. I probably won’t upgrade my 64bit RHEL-5 server any time soon – maybe when RHEL-7 is released. My 64bit Debian test and development server will probably end up running KVM very soon, I need to upgrade the kernel for Ext4 support and that makes KVM more desirable.
So it seems that for me KVM is only going to be seriously used on my laptop for a while.
Generally I am disappointed with KVM. I had hoped that it would give almost the performance of Xen (admittedly it was only 14.5% slower). I had also hoped that it would be really reliable and work with the latest kernels (unlike Xen) but it is giving me problems with 2.6.32 on Opteron. Also it has some new issues such as deciding to quietly do something I don’t want when it’s unable to do what I want it to do.
- [1] http://www.linux-kvm.com/content/get-performance-boost-backing-your-kvm-guest-hugetlbfs
- [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=574063
- [3] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=574069
- [4] http://etbe.coker.com.au/2010/03/16/thinkpad-t61/
- [5] http://etbe.coker.com.au/2007/01/07/xephyr/
- [6] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=574073
libvirt and virt-manager have helped me a lot for KVM virtual machine management, monitoring and remote access to the console.
http://womble.decadent.org.uk/blog/kvm-reliability may be relevant to your situation.
Also, KSM (Kernel Shared Memory) would be worth trying if you’re running multiple KVM guests.
Ad networking, when I last tried, it was possible to set up the tun networking without running KVM itself with root privileges. The tun device can be set (with tunctl) and control granted to a particular user who will run KVM, which than uses it to set up virtual ethernet link to the host. It is even possible to set up the tun device
from /etc/network/interfaces together with all the necessary routing, filtering, bridging or whatever to have
it set up at boot time. I believe it’s documented in the uml-utilities where that tool lives. Guessing from package lists, vde2 might have similar tool, but I never tried that.
The version in Debian lenny is very old and crusty.
Since then KVM has made significant strides in performane and reliability. KVM is now integrated upstream into Qemu and all that happy stuff.
Confusingly Debian has changed the name of the userland from kvm to qemu-kvm.
It’s to the point were I would not consider using KVM on Debian Lenny unless I back port the newer kernel and qemu-kvm to it. Just not worth it.
Oh. And I doubt that KVM will ever really be a match for Linux on Xen. At least with this era’s hardware.
Xen’s paravirtualization approach is just that special. But I still prefer KVM simply because it’s just there and it just works. I don’t have to give up any usability or power management features to run KVM.
If you want to use very fast virtualization you need to take a look at LXC. Linux containers implementation based on a ‘lite’ version of OpenVZ patches. Integrated with newer kernels by default like KVM is. Extremely fast and extremely flexible.
You can use it to isolate entire userlands or just use it to securely sandbox a single application, like your browser. It’s all about isolating namespaces and it’s extremely fast. Much faster then Xen could ever hope to be. No hypervisor, you see. Multiple containers all sharing the same kernel. Similar in concept to ‘chroot’, but massively useful security and isolation-wise then Chroot ever could hope to be.
Linux capabilities will blow your mind also. You can use it to allow a user account to securely execute and start up LXC instances. Setuid root binaries or sudo nonsense. Very very nice stuff being cooked up.
Just a few notes based on my usage of kvm: I use kvm in ‘production’, running a web and a mail server inside kvm virtual machines. Both are rock solid, but the traffic on both is low, so that doesn’t mean too much.
These are just settings which work well for me. I don’t claim they are optimal and I didn’t run benchmarks. But perhaps there is some useful option you missed.
For starting kvm in background I use the following options:
kvm -vnc none -monitor unix:/home/jan/VM/vm1/monitor,server,nowait -daemonize
With -vnc none, an internal vnc server is started, but not connected to any port. As I usually don’t access the console, this is fine. If I need console access for some reason, I can connect to kvm using something like ‘nc -U ~/VM/vm1/monitor’ and enter ‘change vnc 127.0.0.1:x’ to bind the vnc server to port x.
To shutdown the vm I can use
echo “system_powerdown” | nc -q 30 -U /home/jan/VM/vm1/monitor
For networking, I prefer a tap device. This seems to be the most flexible option, as I can add it to a software bridge, use routing,
firewalling etc. just like on a physical interface. As I don’t want to run kvm with root privileges, I add a tap interface
in /etc/network/interfaces:
iface tap_jan0 inet manual
tunctl_user jan
up ifconfig $IFACE up
up ip route add 192.168.1.2/32 dev $IFACE
tunctl_user makes this device accessible by an unpriviliged user, so kvm can be run as a normal user. I use the following options:
-net nic,model=virtio,macaddr=02:00:00:00:10:02 -net tap,ifname=tap_jan0,script=no
model=virtio should be faster than the default and is fine when running a moderately recent linux as client OS.
The same is probably true for disk devices, but I didn’t try that, yet:
Instead of using -hda, one can use -drive file=…,index=0,media=disk,if=virtio
The only stability problem I had with this system (host cpu: AMD Athlon(tm) 64 X2 5600+) was that kvm didn’t like frequency scaling on
the host, which I disabled with
devices/system/cpu/cpu0/cpufreq/scaling_governor = performance
in /etc/sysfs.conf.
I should probably find out if this problem is solved by now to gain the power saving advantages of frequency scaling, as I installed the system more than a year ago and didn’t touch these settings since then.
I hope some of these settings are useful to you.
http://womble.decadent.org.uk/blog/kvm-reliability
Ben points out that the problem with recent kernels crashing under KVM on AMD is due to a regression in a security fix and should be fixed soon.
I thought this was an introduction to KVM :p
http://womble.decadent.org.uk/blog/debian-linux-packages-the-big-bang-release
Ben Hutchings has announced that Xen kernel images are now being prepared for Squeeze.
http://etbe.coker.com.au/2010/03/18/maintaining-screen-output/
The above post has some interesting discussion of maintaining screen output, I’ve got one solution and the comments suggest a better option.
Jason White wrote the following in a comment on my maintaining-screen-output post, unfortunately WordPress doesn’t let me move a comment to a different post so I’ll just paste it in:
My earlier attempt to post failed (probably my fault), and I have further
comments to add anyway.
1. http://womble.decadent.org.uk/blog/kvm-reliability
2. Be sure to use the Virtio drivers for the network and block devices:
http://www.linux-kvm.org/page/Virtio
3. Kernel Shared Memory (KSM) may be useful if you plan to run multiple KVM
guests.
You should add OpenVZ to your tests. If you are looking for something that is a middle ground between Xen and chroot, it is fast, mature, and fully supported in Squeeze.