Archives

Categories

A Support Guide for Xen

Here’s a guide to supporting Xen servers for people who are not Linux experts. If your job means that you have root access to a Xen server that someone else installed for the purpose of fixing problems when they are not available then this will help you solve some common problems.

Xen is a virtualization system that is primarily used for running Linux virtual machines under a Linux host. It is mostly used as a Paravirtualization system in that the virtual machine knows that it is running in a virtual environment – this allows some performance benefits.

The host environment is known as Dom0 and root in that domain has the ability to control the other domains (which are known as DomU domains). If you perform an orderly shutdown of the Dom0 (via the shutdown or reboot commands or notification from the UPS of an impending power failure) then when the machine is booted again the DomU’s will be automatically restarted (if the on_reboot setting has the value restart – a common configuration). If you run the command shutdown in a DomU then the domain will be destroyed, and the command reboot will restart the DomU with the same settings – if you want to change the settings for a DomU you need to shut it down and create a new instance.

The main sys-admin command related to Xen is xm. Here are the main xm options that are useful in support:

xm list

xm list provides a list of running domains. For each domain it gives the name of the domain, the ID number, the memory allocated to it, the number of virtual CPUs allocated to it, the state, and the amount of CPU time used in execution. The ID numbers are allocated sequentially, if you reboot a DomU by running the command “reboot” inside it then it will get a new ID number when it re-starts. Many xm operations that may take the name of a domain will also take a Domain ID number. Generally you never use an ID number and ignore it – the only relevant thing about an ID is whether it is 0.
Here is a sample of the output of xm list:
# xm list
Name        ID Mem(MiB) VCPUs State  Time(s)
Domain-0    0    1236    4 r-----  14116.3
wind        13    2999    3 -b----  60114.1
wind-f7    52      519    1 -b----  3329.9
You can see from this output that the domain named wind has 2999M of RAM, 3 virtual CPUs (out of 4 physical CPUs in the machine) and has 60,114 seconds of CPU time used (that is 114 minutes of CPU use – the equivalent of almost two hours for a single CPU). Here are the values you might see in the state field (from the man page xm(1)):

  • r – running
    The domain is currently running on a CPU – note that Dom0 will always appear to be running because you are running the xm utility!
  • b – blocked
    The domain is blocked, and not running or runnable. This can be caused because the domain is waiting on IO (a traditional wait state) or has gone to sleep because there was nothing else for it to do.
  • p – paused
    The domain has been paused, usually occurring through the administrator running xm pause. When in a paused state the domain will still consume allocated resources like memory, but will not be eligible for scheduling by the Xen hypervisor.
  • c – crashed
    The domain has crashed, which is always a violent ending. Usu‐ ally this state can only occur if the domain has been config‐ ured not to restart on crash. See xmdomain.cfg for more info.
  • d – dying
    The domain is in process of dying, but hasn’t completely shut‐ down or crashed.

If you see domains that are running which normally aren’t busy then make a note of this. If you see domains that are paused, crashed, or dying then contact the sys-admin.
Also know which domains are expected to be running so that if a domain is missing then you will recognise it as a problem!

xm top

xm top is similar to the top command in Unix but displays Xen data, by default it displays the same information as xm list but also includes the amount of data read and written from network devices and disks. If your terminal is less than about 145 columns wide the lines will wrap and it will be confusing – stretch the width of your xterm before running it.

If you have multiple network interfaces then you can see the transfer counts for each of them separately by pressing the N key. If you have multiple network interfaces in DomU’s then this can help diagnose some network problems (although you may find that tcpdump is more useful).

If you have multiple disk devices in a DomU then you can see their transfer counts separately by pressing the B key. One problem that can be partially diagnosed through this is excessively poor performance. If a DomU is running extremely slowly then it may be impossible to login to diagnose and/or fix the problem (it could take tens of minutes to login), in that case seeing where the disk access is going from outside the DomU can shed some light on the problem.

VBD  768 [ 3: 0]
VBD  832 [ 3:40]
VBD 5632 [16: 0]
VBD 5696 [16:40]

Above is the identification of the virtual devices /dev/hda and /dev/hdb in a DomU. The numbers inside the brackets are the device node numbers in hexadecimal, so 16:40 means the device 22,64 as a pair of decimal numbers (22*256+64=5696).

# ls -l /dev/hd?
brw-r----- 1 root disk  3,  0 Jul 23 17:24 /dev/hda
brw-r----- 1 root disk  3, 64 Jul 23 17:24 /dev/hdb
brw-r----- 1 root disk 22,  0 Jul 23 17:24 /dev/hdc
brw-r----- 1 root disk 22, 64 Jul 23 17:24 /dev/hdd

Above is the result of a ls -l on the devices in question from inside the DomU.

When I set up a Xen DomU I generally use /dev/hda for the root filesystem and /dev/hdb for swap. So if the machine is performing poorly and /dev/hdb ([3:40]) is being accessed excessively then it indicates that the machine has some memory hungry programs running and is paging heavily.

xm list --long

xm list --long [domain] gives detailed information on all domains, or it can be run with the name of a domain such as xm list --long wind to give the detailed information on only one domain. Generally this is something that you will log to a disk file before restarting domains, in the short-term there is little use for this.

xm console

xm console <domain> gives you the console of a domain. If a domain is not working correctly and it is impossible to login via ssh (either due to a network problem or a problem with ssh) then you can access the console (equivalent to a serial-port login on physical hardware) to diagnose the problem. Often the kernel will log messages to the console, such messages will be stored by the Xen system until they are read. If you suspect that there may be many such messages then use script(1) to log the output to disk, if you are unsure then use script to make sure that you don’t miss any data. Even if you don’t understand it the sys-admin probably will!

If the system is half-working then you can login as root to investigate problems. You can escape from the console by pressing CTRL-].

xm dmesg

xm dmesg gives Xen logging data comparable to the dmesg command in Linux. If you ever have to reboot the machine (run reboot from Dom0) due to a problem with Xen then you MUST save the output of xm dmesg to a file for later review by the sys-admin.

xm destroy

xm destroy <domain> will kill a specified domain. It’s a last resort for stopping a domain that is not working correctly – it is greatly preferrable to login to the domain via ssh or xm console and give an orderly shutdown.

xm create

xm create [-c] <domain> creates a new domain. The configuration for the domain will be taken from a file of the same name in the current directory or in the /etc/xen directory – if /etc/xen is not the current directory when you run xm create then make sure that there is no file-name conflict. You can use this command after destroying a domain or to start a domain that was not previously run.

If you want to change a configuration option of a domain (such as the amount of RAM used) then the usual procedure is to edit the configuration file, run halt or shutdown from within the domain, and then create the domain again with xm create. Note that the -c option is used to attach to the console after starting the domain (you usually want to do this).

I will probably update this post when I get some feedback. I may write more posts of a similar nature if there are requests.

Train Routing

Last year I spent several months living on one side of Melbourne and working on the other and travelling by train to work. Every day I had to catch two trains each way with an average wait of 5 to 10 minutes for each train to arrive giving a total of at least half an hour a day spent waiting for trains.

The obvious solution to this problem is to have trains not go back and forth on one line but instead go from one side of the city to the other. This map shows that there are 14 train lines out of the city with about 5 major lines. The smart thing to do would be to have every major line have one train every 5 minutes during peak hours and to have every train go back out on a different line. Then if you are on one of the major lines and want to travel out on one of the other major lines then every 20 minutes there would have a train that would take you straight there without changing trains!

Currently we don’t even have such frequent trains, during peak hour the trains on most lines run no more often than 5 per hour, the Sydenham line has trains 4 times per hour during peak hours and the trains are crammed full before they get half-way to the city.

If the trains ran more frequently and were routed through the city then commuters who travel through the city would save 20+ minutes per day without going to any effort and 30+ minutes a day if they chose to start their journey at a time to avoid changing trains. This would be a significant incentive for catching the train instead of driving!

For the commuters who travel to work via a single journey then having trains run every 5 minutes at peak times would mean that an average of 2.5 minutes was spent waiting for a train each way (an average of 5 minutes per day) instead of the current situation of 15 minutes per day or more. This would mean triple the number of trains on the Sydenham line which may sound excessive. However the trains are currently so crowded that there could be twice as many trains and all seats would still be full. If there were three times as many trains then I expect that more people would catch the train (surely some people would be convinced to drive to work by the idea of spending 20 minutes with barely room to stand), it’s not inconceivable that there could be three times as many trains and all seats could still be full!

The next issue I have been considering is the time taken for a tram ride to/from the central city areas in peak hours. Peak hour trams stop at every stop because there are always people getting on and off. If a tram could stop less frequently then it could make a slightly higher average speed. One way of achieving this would be for the peak hour trams to stop at every second stop outside the center of the city. On the way in half the trams would accept passengers at each stop (each tram would be designated as either odd or even and labelled as such – the tram stops are already numbered). But if you have twice as many trams then the average wait would be the same while the duration of the trip would be reduced. On the way out of the city the tram driver would announce that after stop 10 (to pick a random number that might work) the tram would only allow passengers to get on or off at even/odd stops. If you knew that your stop was on an even number and the tram was an odd-numbered tram then you would change trams to an even tram. The small delay in changing trams would be made up by a faster trip overall.

Politicians are always talking about ways to alleviate the water shortage caused by climate change and to improve the economy. Having people spend an extra 10 minutes a day working because of saved time on the trains would help the economy. Encouraging people to catch the trains via more frequent and efficient service as well as less overcrowding would help reduce climate change – which is the best way of improving our water supply and the only way of helping the farmers long-term!

Blogging Frequency

You may have noticed that my frequency of posting has increased significantly recently, and that my posts are generally at 7AM and 7PM. I am using the scheduled posting feature of WordPress and writing my posts in advance (currently I have 8 posts in the queue including this one). Generally readers of my blog (particularly those who read Planet syndication pages) don’t want to read 5+ posts in one go. In the past I would write one post at a time and sometimes a couple of days would pass between feeling inspired to write. I had been writing text files and uploading them, but having a text file that’s 95% done is not the same as having a complete post scheduled to be released while I’m doing something else.

This technique works for me, I encourage other people who have things to say but don’t seem to get around to blogging on some days to give it a go.

Incidentally I chose the times 7AM and 7PM because the traffic to my site seems to peak between 7PM and 11PM local time (I haven’t analysed the logs in enough detail to determine whether this is Australians reading blogs after work or people in other countries doing it at work). To release two posts per day it seems most appropriate to space them 12 hours apart (and means that people in the same time zone as me can read a post before going to work).

If You Don’t Know How to Fix It, Please Stop Breaking It

In 1992 Severn Cullis-Suzuki (David Suzuki’s daughter) who was 12 years old gave a talk to the UN’s Earth Summit in Rio on behalf of . She gave a really good talk, see the below Youtube video. The best quote is “If you don’t know how to fix it, please stop breaking it!”. Unfortunately they haven’t stopped breaking things yet.

http://www.youtube.com/watch?v=uZsDliXzyAY.

Here is some background information on Several Cullis-Suzuki.

An Ideal Linux Install Process for Xen

I believe that an ideal installation process for Linux would have the option of performing a Xen install.

The basic functionality of installing the Xen versions of the required packages (the kernel and libc), the Xen hypervisor, and the Xen tools is already done well in Fedora and it’s an option to install them in Debian. But more than that is required.

Xen has two options for networking, bridging and routing. The bridging option can be confusing to set up and changing a system from routed to bridged networking once it’s running is a risky process. I have documented the basic requirements for running bridging in a previous post, but it would be better if there was an option to have Xenbr0 as the primary device from the initial install – and there are non-Xen reasons for doing this so it would be a more generally useful feature.

Another common requirement for a Xen server is to have a DVD image on the local hard drive for creating new DomU’s. If we are going to need a copy of the DVD on the local hard drive for Xen installation and we need data from the DVD for the Dom0 installation then it makes sense to have one of the early installation tasks (immediately after running mkfs) be to copy the contents of the DVD to the hard drive. Hard drives are significantly faster than DVDs – especially for random access. It would also avoid the not uncommon annoyance of getting part way through an install only to encounter a DVD or CD read error…

Here are some reasons for running Xen (or an equivalent technology) when not running more than one DomU:

  1. Avoid problems booting. Everyone who has spent any significant amount of time running servers has had problems where machines don’t boot. Even with a capable out of band management option such as the HP ILO it can be unreasonably inconvenient to fix such problems. Separating the base hardware management tasks of the OS from the user process management tasks makes recovery much easier. If a DomU stops booting then it’s easy to mount it on the Dom0 and chroot into it to discovere the problem.
  2. Easier upgrades. Often you have users demand that you install software that only works with a newer version of the OS. You can install the new version under a different DomU, test it, and then replace the old DomU when you think it’ll work – this gives a matter of minutes of down-time instead of hours for the upgrade. If the upgrade doesn’t work then you destroy the DomU and create one for the old version. Running two versions of the OS at the same time with NFS shares for the data files is also possible.
  3. Security. If a DomU gets cracked the Dom0 will not necessarily be compromised, this puts you in a good position to track down what the attackers have done. You can get a dump of the DomU’s memory to enable experts to examine what the attackers were doing. Reinstalling a DomU to replace data potentially corrupted by an attacker is much easier than reinstalling an entire machine.

Even in situations when reason #2 was the motivation for installing Xen I believe that most systems will want to have a Xen DomU running the same version as the Dom0 for the initial install. Therefore integrating the installation process would make things easier. Among other benefits if you have a server with multiple CPUs (the minimum number seems to be two CPUs on all recent machines) and hardware RAID then doing two installations at the same time is likely to give better performance overall. Also I believe that it will often be the case that the Dom0 will exist purely to support DomU’s, therefore if you only install the Dom0 then you have done less than half the installation!

For a manual installation there are some reasons for not doing this all at the same time. Having the sys-admin enter configuration data for some DomU’s at the same time as the Dom0 can get confusing. However for an automated install this would be desirable. I would like to boot from a CD and have the installation process take all configuration from the network (either via NFS or HTTP) and then perform the complete installation of the Dom0 and the DomU’s automatically.

Let me know what you think of these ideas, it’s just at the conceptual stage at the moment.

Xen and Bridging

In a default configuration of Xen there will be a virtual Ethernet device created for each interface which will be associated with a bridge. A previous post documented how to configure a bridge named xenbr0.

The basic configuration of Xen that most people use is to have a single virtual Ethernet port for each Xen instance and have them all connected to the one bridge, and then the Dom0 will have an IP address on the bridge interface that is used for routing packets to the outside world. This works really well if you have a subnet that you are using for all Xen DomU IP addresses, if you are using NAT for communication, or if the DomU needs no communication outside the Dom0 and other DomU’s on the same machine (a common case for testing).

But if you have a collection of servers that you want to consolidate on a single piece of hardware then you end up using a single sub-net that spans some physical machines, some Xen Dom0’s, and some DomU’s. The solution to this is to use bridged networking.

Unfortunately most documentation of bridged networking is really confusing, and non of my google searches turned up the most relevant fact:

When setting up a bridge on the local Ethernet you must make your physical ethernet device (eth0 or whatever) be strictly a slave to the bridge and then assign the IP address used for the physical network to the bridge.

ifconfig eth0 up
brctl addif xenbr0 eth0

For example if you have 10.0.0.42 being the IP address used by the Dom0 on the local Ethernet via device eth0 and you want to use bridging for DomU’s then you simply make eth0 owned by xenbr0 (the typical name for the Xen bridge) with the above commands in your script to configure the xenbr0 device. Then treat xenbr0 in the same way that you treated eth0 before enabling bridging.

Also there’s nothing stopping you from having one bridge for DomU’s that can talk directly to the physical Ethernet and another for DomU’s that are only to use routed networking, see my previous post about using multiple ethernet devices in Xen for more background information.

The Australian Government is a Terrorist Organisation

This article in The Age about Mohamed Haneef shows the terrorist threat that we face.

The chance that I will be injured by Al Quaeda in any way is quite remote. The chance of being attacked by ASIO is a lot greater.

The main benefit of being in a democracy is having a legal system where the defendant is presumed innocent until proven guilty and where they have the right to legal representation. The war in Iraq has not brought the US or Australian system of government to Iraq, instead it is bringing Saddam Hussein’s system of government to Australia and the US.

Traditionally under the Australian and US legal systems innocent people are not punished, unlike under Saddam Hussein. Now ASIO has the authority to detain innocent civilians indefinitely if they believe that it helps them in some way – and there is no method of policing ASIO to ensure that even such excuses are met.

Traditionally under the Australian and US legal systems everyone who is accused of a crime is entitled to a trial, unlike under Saddam Hussein. Now ASIO and the CIA have been given the authority to punish anyone without a trial. ASIO can also extend the punishment to anyone who might receive evidence of such actions and publish it (I guess that the CIA can do the same).

Saddam lost the battle but his legacy is winning the war.

For the best definition of Terrorism see Noam Chomsky’s paper. The actions taken by the Australian government against the people of Iraq, foreign citizens in Australia, and almost certainly Australian citizens (it’s not credible to believe that ASIO has such powers and doesn’t use them occasionally on Australians) fits the definition of Terrorism.

Elections are coming soon, both in the US and in Australia. Whatever you do, don’t vote for Neo-Cons (Republicans in the US or Liberals in Australia).

PS Before anyone suggests that I should worry about ASIO kidnapping me in retaliation for this, I’m sure that they know of the Streisand Effect. I’ll try and avoid any unplanned down-time for my blog after this post goes out to avoid false-alarms… ;)

Update: I incorrectly wrote “guilty until proven innocent” above, that is the current Australian government policy not the way it should be.

Exit Strategies for Iraq

Here’s an interesting piece in the Washington Post about what might happen when the US withdraws from Iraq.

I regret not blogging before the war started. It would have been good if I could have pointed to a blog post predicting the same thing before the invasion took place. I’ve always thought that the two possibilities for Iraq were for the country to be partitioned (which would be likely to weaken Turkey and strengthen Iran and thus be avoided by the US if possible) or run by an absolute despot.

Modules and NFS for Xen

I’m just in the process of converting a multi-user system to a Xen DomU. It was running on a stand-alone Fedora Core 5 i386 system and I want to run it on a Fedora 7 DomU under a CentOS 5 Dom0 on an Opteron system.

The first stage of the conversion was to copy an image of the Fedora Core 5 system and make it a DomU under CentOS. I had some problems getting a Fedora Core 5 Xen kernel to boot so I installed a 64bit CentOS 5 kernel with the Fedora Core 5 user-space and surprisingly everything worked. I had expected to have problems with kernel modules, but everything just worked! I had expected that the 32bit modutils would be unable to load 64bit modules, but things just worked.

The first stage was to have the old server NFS export /home and have it mounted by the Xen DomU, this worked well for about a week. The next step was to move the data on to the new server. My first attempt was to have the Dom0 running the filesystems and NFS exporting them to the DomU but this caused an OpenOffice error “Error saving the document Name: General Error. General input/output error.“.

So having 32bit Fedora Core 5 with a 64bit Cent OS 5 kernel NFS mounting from a 32bit Fedora Core 5 system works well, while mounting from a 64bit Cent OS 5 system fails. If anything I would have expected better results from having the same version of the kernel on NFS client and server.

The next issue is whether a 64bit Fedora 7 system in a DomU can NFS mount the data from the Cent OS 5 kernel with Fedora Core 5 user-space. If not it’ll make testing the Fedora 7 upgrade significantly more painful than it might otherwise be.

If only we had a network filesystem for Unix that supported POSIX semantics.

Troll Zapping

Don Marti writes about the idea of setting a Troll-bit on forum posts such that every reply would also be flagged.

I’ve been thinking about how to solve such issues for mailing lists. I think that the way to do this is to create a new list for every contentious topic and automatically subscribe everyone who posted to the thread after the messages that were flagged as being too far off-topic. After that time anyone who tries to post to the main list with a matching subject or a header indicating that the message is a reply to an off-topic message would have their message redirected to the new list – and they would be automatically subscribed.

This would keep the off-topic messages away from the main list and also serve as a minor dis-incentive for people to post to threads that start going off-topic as most people won’t want to be subscribed to the new list for off-topic messages.

However such messages would still be archived (in a different section) so if the moderator mis-classified a thread it could still be reviewed by other people.

Also when moderating such threads it would be interesting to experiment with consensus moderation of posts. If N subscribers of the list who post regularly use a web form to indicate that a certain post was too far off-topic and should spawn a new list then that could happen.

I agree with Joey Hess’ rant about forums, so solving a problem for forums is not of interest to me, but hacking on a list server is something that I would do if I had enough spare time.