Here’s a guide to supporting Xen servers for people who are not Linux experts. If your job means that you have root access to a Xen server that someone else installed for the purpose of fixing problems when they are not available then this will help you solve some common problems.
Xen is a virtualization system that is primarily used for running Linux virtual machines under a Linux host. It is mostly used as a Paravirtualization system in that the virtual machine knows that it is running in a virtual environment – this allows some performance benefits.
The host environment is known as Dom0 and root in that domain has the ability to control the other domains (which are known as DomU domains). If you perform an orderly shutdown of the Dom0 (via the shutdown or reboot commands or notification from the UPS of an impending power failure) then when the machine is booted again the DomU’s will be automatically restarted (if the on_reboot setting has the value restart – a common configuration). If you run the command shutdown in a DomU then the domain will be destroyed, and the command reboot will restart the DomU with the same settings – if you want to change the settings for a DomU you need to shut it down and create a new instance.
The main sys-admin command related to Xen is xm. Here are the main xm options that are useful in support:
xm list provides a list of running domains. For each domain it gives the name of the domain, the ID number, the memory allocated to it, the number of virtual CPUs allocated to it, the state, and the amount of CPU time used in execution. The ID numbers are allocated sequentially, if you reboot a DomU by running the command “reboot” inside it then it will get a new ID number when it re-starts. Many xm operations that may take the name of a domain will also take a Domain ID number. Generally you never use an ID number and ignore it – the only relevant thing about an ID is whether it is 0.
Here is a sample of the output of xm list:
# xm list
Name ID Mem(MiB) VCPUs State Time(s)
Domain-0 0 1236 4 r----- 14116.3
wind 13 2999 3 -b---- 60114.1
wind-f7 52 519 1 -b---- 3329.9
You can see from this output that the domain named wind has 2999M of RAM, 3 virtual CPUs (out of 4 physical CPUs in the machine) and has 60,114 seconds of CPU time used (that is 114 minutes of CPU use – the equivalent of almost two hours for a single CPU). Here are the values you might see in the state field (from the man page xm(1)):
- r – running
The domain is currently running on a CPU – note that Dom0 will always appear to be running because you are running the xm utility!
- b – blocked
The domain is blocked, and not running or runnable. This can be caused because the domain is waiting on IO (a traditional wait state) or has gone to sleep because there was nothing else for it to do.
- p – paused
The domain has been paused, usually occurring through the administrator running xm pause. When in a paused state the domain will still consume allocated resources like memory, but will not be eligible for scheduling by the Xen hypervisor.
- c – crashed
The domain has crashed, which is always a violent ending. Usu‐ ally this state can only occur if the domain has been config‐ ured not to restart on crash. See xmdomain.cfg for more info.
- d – dying
The domain is in process of dying, but hasn’t completely shut‐ down or crashed.
If you see domains that are running which normally aren’t busy then make a note of this. If you see domains that are paused, crashed, or dying then contact the sys-admin.
Also know which domains are expected to be running so that if a domain is missing then you will recognise it as a problem!
xm top is similar to the top command in Unix but displays Xen data, by default it displays the same information as xm list but also includes the amount of data read and written from network devices and disks. If your terminal is less than about 145 columns wide the lines will wrap and it will be confusing – stretch the width of your xterm before running it.
If you have multiple network interfaces then you can see the transfer counts for each of them separately by pressing the N key. If you have multiple network interfaces in DomU’s then this can help diagnose some network problems (although you may find that tcpdump is more useful).
If you have multiple disk devices in a DomU then you can see their transfer counts separately by pressing the B key. One problem that can be partially diagnosed through this is excessively poor performance. If a DomU is running extremely slowly then it may be impossible to login to diagnose and/or fix the problem (it could take tens of minutes to login), in that case seeing where the disk access is going from outside the DomU can shed some light on the problem.
VBD 768 [ 3: 0]
VBD 832 [ 3:40]
VBD 5632 [16: 0]
VBD 5696 [16:40]
Above is the identification of the virtual devices /dev/hda and /dev/hdb in a DomU. The numbers inside the brackets are the device node numbers in hexadecimal, so 16:40 means the device 22,64 as a pair of decimal numbers (22*256+64=5696).
# ls -l /dev/hd?
brw-r----- 1 root disk 3, 0 Jul 23 17:24 /dev/hda
brw-r----- 1 root disk 3, 64 Jul 23 17:24 /dev/hdb
brw-r----- 1 root disk 22, 0 Jul 23 17:24 /dev/hdc
brw-r----- 1 root disk 22, 64 Jul 23 17:24 /dev/hdd
Above is the result of a ls -l on the devices in question from inside the DomU.
When I set up a Xen DomU I generally use /dev/hda for the root filesystem and /dev/hdb for swap. So if the machine is performing poorly and /dev/hdb ([3:40]) is being accessed excessively then it indicates that the machine has some memory hungry programs running and is paging heavily.
xm list --long
xm list --long [domain] gives detailed information on all domains, or it can be run with the name of a domain such as xm list --long wind to give the detailed information on only one domain. Generally this is something that you will log to a disk file before restarting domains, in the short-term there is little use for this.
xm console <domain> gives you the console of a domain. If a domain is not working correctly and it is impossible to login via ssh (either due to a network problem or a problem with ssh) then you can access the console (equivalent to a serial-port login on physical hardware) to diagnose the problem. Often the kernel will log messages to the console, such messages will be stored by the Xen system until they are read. If you suspect that there may be many such messages then use script(1) to log the output to disk, if you are unsure then use script to make sure that you don’t miss any data. Even if you don’t understand it the sys-admin probably will!
If the system is half-working then you can login as root to investigate problems. You can escape from the console by pressing CTRL-].
xm dmesg gives Xen logging data comparable to the dmesg command in Linux. If you ever have to reboot the machine (run reboot from Dom0) due to a problem with Xen then you MUST save the output of xm dmesg to a file for later review by the sys-admin.
xm destroy <domain> will kill a specified domain. It’s a last resort for stopping a domain that is not working correctly – it is greatly preferrable to login to the domain via ssh or xm console and give an orderly shutdown.
xm create [-c] <domain> creates a new domain. The configuration for the domain will be taken from a file of the same name in the current directory or in the /etc/xen directory – if /etc/xen is not the current directory when you run xm create then make sure that there is no file-name conflict. You can use this command after destroying a domain or to start a domain that was not previously run.
If you want to change a configuration option of a domain (such as the amount of RAM used) then the usual procedure is to edit the configuration file, run halt or shutdown from within the domain, and then create the domain again with xm create. Note that the -c option is used to attach to the console after starting the domain (you usually want to do this).
I will probably update this post when I get some feedback. I may write more posts of a similar nature if there are requests.