5

A Support Guide for Xen

Here’s a guide to supporting Xen servers for people who are not Linux experts. If your job means that you have root access to a Xen server that someone else installed for the purpose of fixing problems when they are not available then this will help you solve some common problems.

Xen is a virtualization system that is primarily used for running Linux virtual machines under a Linux host. It is mostly used as a Paravirtualization system in that the virtual machine knows that it is running in a virtual environment – this allows some performance benefits.

The host environment is known as Dom0 and root in that domain has the ability to control the other domains (which are known as DomU domains). If you perform an orderly shutdown of the Dom0 (via the shutdown or reboot commands or notification from the UPS of an impending power failure) then when the machine is booted again the DomU’s will be automatically restarted (if the on_reboot setting has the value restart – a common configuration). If you run the command shutdown in a DomU then the domain will be destroyed, and the command reboot will restart the DomU with the same settings – if you want to change the settings for a DomU you need to shut it down and create a new instance.

The main sys-admin command related to Xen is xm. Here are the main xm options that are useful in support:

xm list

xm list provides a list of running domains. For each domain it gives the name of the domain, the ID number, the memory allocated to it, the number of virtual CPUs allocated to it, the state, and the amount of CPU time used in execution. The ID numbers are allocated sequentially, if you reboot a DomU by running the command “reboot” inside it then it will get a new ID number when it re-starts. Many xm operations that may take the name of a domain will also take a Domain ID number. Generally you never use an ID number and ignore it – the only relevant thing about an ID is whether it is 0.
Here is a sample of the output of xm list:
# xm list
Name        ID Mem(MiB) VCPUs State  Time(s)
Domain-0    0    1236    4 r-----  14116.3
wind        13    2999    3 -b----  60114.1
wind-f7    52      519    1 -b----  3329.9
You can see from this output that the domain named wind has 2999M of RAM, 3 virtual CPUs (out of 4 physical CPUs in the machine) and has 60,114 seconds of CPU time used (that is 114 minutes of CPU use – the equivalent of almost two hours for a single CPU). Here are the values you might see in the state field (from the man page xm(1)):

  • r – running
    The domain is currently running on a CPU – note that Dom0 will always appear to be running because you are running the xm utility!
  • b – blocked
    The domain is blocked, and not running or runnable. This can be caused because the domain is waiting on IO (a traditional wait state) or has gone to sleep because there was nothing else for it to do.
  • p – paused
    The domain has been paused, usually occurring through the administrator running xm pause. When in a paused state the domain will still consume allocated resources like memory, but will not be eligible for scheduling by the Xen hypervisor.
  • c – crashed
    The domain has crashed, which is always a violent ending. Usu‐ ally this state can only occur if the domain has been config‐ ured not to restart on crash. See xmdomain.cfg for more info.
  • d – dying
    The domain is in process of dying, but hasn’t completely shut‐ down or crashed.

If you see domains that are running which normally aren’t busy then make a note of this. If you see domains that are paused, crashed, or dying then contact the sys-admin.
Also know which domains are expected to be running so that if a domain is missing then you will recognise it as a problem!

xm top

xm top is similar to the top command in Unix but displays Xen data, by default it displays the same information as xm list but also includes the amount of data read and written from network devices and disks. If your terminal is less than about 145 columns wide the lines will wrap and it will be confusing – stretch the width of your xterm before running it.

If you have multiple network interfaces then you can see the transfer counts for each of them separately by pressing the N key. If you have multiple network interfaces in DomU’s then this can help diagnose some network problems (although you may find that tcpdump is more useful).

If you have multiple disk devices in a DomU then you can see their transfer counts separately by pressing the B key. One problem that can be partially diagnosed through this is excessively poor performance. If a DomU is running extremely slowly then it may be impossible to login to diagnose and/or fix the problem (it could take tens of minutes to login), in that case seeing where the disk access is going from outside the DomU can shed some light on the problem.

VBD  768 [ 3: 0]
VBD  832 [ 3:40]
VBD 5632 [16: 0]
VBD 5696 [16:40]

Above is the identification of the virtual devices /dev/hda and /dev/hdb in a DomU. The numbers inside the brackets are the device node numbers in hexadecimal, so 16:40 means the device 22,64 as a pair of decimal numbers (22*256+64=5696).

# ls -l /dev/hd?
brw-r----- 1 root disk  3,  0 Jul 23 17:24 /dev/hda
brw-r----- 1 root disk  3, 64 Jul 23 17:24 /dev/hdb
brw-r----- 1 root disk 22,  0 Jul 23 17:24 /dev/hdc
brw-r----- 1 root disk 22, 64 Jul 23 17:24 /dev/hdd

Above is the result of a ls -l on the devices in question from inside the DomU.

When I set up a Xen DomU I generally use /dev/hda for the root filesystem and /dev/hdb for swap. So if the machine is performing poorly and /dev/hdb ([3:40]) is being accessed excessively then it indicates that the machine has some memory hungry programs running and is paging heavily.

xm list --long

xm list --long [domain] gives detailed information on all domains, or it can be run with the name of a domain such as xm list --long wind to give the detailed information on only one domain. Generally this is something that you will log to a disk file before restarting domains, in the short-term there is little use for this.

xm console

xm console <domain> gives you the console of a domain. If a domain is not working correctly and it is impossible to login via ssh (either due to a network problem or a problem with ssh) then you can access the console (equivalent to a serial-port login on physical hardware) to diagnose the problem. Often the kernel will log messages to the console, such messages will be stored by the Xen system until they are read. If you suspect that there may be many such messages then use script(1) to log the output to disk, if you are unsure then use script to make sure that you don’t miss any data. Even if you don’t understand it the sys-admin probably will!

If the system is half-working then you can login as root to investigate problems. You can escape from the console by pressing CTRL-].

xm dmesg

xm dmesg gives Xen logging data comparable to the dmesg command in Linux. If you ever have to reboot the machine (run reboot from Dom0) due to a problem with Xen then you MUST save the output of xm dmesg to a file for later review by the sys-admin.

xm destroy

xm destroy <domain> will kill a specified domain. It’s a last resort for stopping a domain that is not working correctly – it is greatly preferrable to login to the domain via ssh or xm console and give an orderly shutdown.

xm create

xm create [-c] <domain> creates a new domain. The configuration for the domain will be taken from a file of the same name in the current directory or in the /etc/xen directory – if /etc/xen is not the current directory when you run xm create then make sure that there is no file-name conflict. You can use this command after destroying a domain or to start a domain that was not previously run.

If you want to change a configuration option of a domain (such as the amount of RAM used) then the usual procedure is to edit the configuration file, run halt or shutdown from within the domain, and then create the domain again with xm create. Note that the -c option is used to attach to the console after starting the domain (you usually want to do this).

I will probably update this post when I get some feedback. I may write more posts of a similar nature if there are requests.

An Ideal Linux Install Process for Xen

I believe that an ideal installation process for Linux would have the option of performing a Xen install.

The basic functionality of installing the Xen versions of the required packages (the kernel and libc), the Xen hypervisor, and the Xen tools is already done well in Fedora and it’s an option to install them in Debian. But more than that is required.

Xen has two options for networking, bridging and routing. The bridging option can be confusing to set up and changing a system from routed to bridged networking once it’s running is a risky process. I have documented the basic requirements for running bridging in a previous post, but it would be better if there was an option to have Xenbr0 as the primary device from the initial install – and there are non-Xen reasons for doing this so it would be a more generally useful feature.

Another common requirement for a Xen server is to have a DVD image on the local hard drive for creating new DomU’s. If we are going to need a copy of the DVD on the local hard drive for Xen installation and we need data from the DVD for the Dom0 installation then it makes sense to have one of the early installation tasks (immediately after running mkfs) be to copy the contents of the DVD to the hard drive. Hard drives are significantly faster than DVDs – especially for random access. It would also avoid the not uncommon annoyance of getting part way through an install only to encounter a DVD or CD read error…

Here are some reasons for running Xen (or an equivalent technology) when not running more than one DomU:

  1. Avoid problems booting. Everyone who has spent any significant amount of time running servers has had problems where machines don’t boot. Even with a capable out of band management option such as the HP ILO it can be unreasonably inconvenient to fix such problems. Separating the base hardware management tasks of the OS from the user process management tasks makes recovery much easier. If a DomU stops booting then it’s easy to mount it on the Dom0 and chroot into it to discovere the problem.
  2. Easier upgrades. Often you have users demand that you install software that only works with a newer version of the OS. You can install the new version under a different DomU, test it, and then replace the old DomU when you think it’ll work – this gives a matter of minutes of down-time instead of hours for the upgrade. If the upgrade doesn’t work then you destroy the DomU and create one for the old version. Running two versions of the OS at the same time with NFS shares for the data files is also possible.
  3. Security. If a DomU gets cracked the Dom0 will not necessarily be compromised, this puts you in a good position to track down what the attackers have done. You can get a dump of the DomU’s memory to enable experts to examine what the attackers were doing. Reinstalling a DomU to replace data potentially corrupted by an attacker is much easier than reinstalling an entire machine.

Even in situations when reason #2 was the motivation for installing Xen I believe that most systems will want to have a Xen DomU running the same version as the Dom0 for the initial install. Therefore integrating the installation process would make things easier. Among other benefits if you have a server with multiple CPUs (the minimum number seems to be two CPUs on all recent machines) and hardware RAID then doing two installations at the same time is likely to give better performance overall. Also I believe that it will often be the case that the Dom0 will exist purely to support DomU’s, therefore if you only install the Dom0 then you have done less than half the installation!

For a manual installation there are some reasons for not doing this all at the same time. Having the sys-admin enter configuration data for some DomU’s at the same time as the Dom0 can get confusing. However for an automated install this would be desirable. I would like to boot from a CD and have the installation process take all configuration from the network (either via NFS or HTTP) and then perform the complete installation of the Dom0 and the DomU’s automatically.

Let me know what you think of these ideas, it’s just at the conceptual stage at the moment.

2

Xen and Bridging

In a default configuration of Xen there will be a virtual Ethernet device created for each interface which will be associated with a bridge. A previous post documented how to configure a bridge named xenbr0.

The basic configuration of Xen that most people use is to have a single virtual Ethernet port for each Xen instance and have them all connected to the one bridge, and then the Dom0 will have an IP address on the bridge interface that is used for routing packets to the outside world. This works really well if you have a subnet that you are using for all Xen DomU IP addresses, if you are using NAT for communication, or if the DomU needs no communication outside the Dom0 and other DomU’s on the same machine (a common case for testing).

But if you have a collection of servers that you want to consolidate on a single piece of hardware then you end up using a single sub-net that spans some physical machines, some Xen Dom0’s, and some DomU’s. The solution to this is to use bridged networking.

Unfortunately most documentation of bridged networking is really confusing, and non of my google searches turned up the most relevant fact:

When setting up a bridge on the local Ethernet you must make your physical ethernet device (eth0 or whatever) be strictly a slave to the bridge and then assign the IP address used for the physical network to the bridge.

ifconfig eth0 up
brctl addif xenbr0 eth0

For example if you have 10.0.0.42 being the IP address used by the Dom0 on the local Ethernet via device eth0 and you want to use bridging for DomU’s then you simply make eth0 owned by xenbr0 (the typical name for the Xen bridge) with the above commands in your script to configure the xenbr0 device. Then treat xenbr0 in the same way that you treated eth0 before enabling bridging.

Also there’s nothing stopping you from having one bridge for DomU’s that can talk directly to the physical Ethernet and another for DomU’s that are only to use routed networking, see my previous post about using multiple ethernet devices in Xen for more background information.

2

The Australian Government is a Terrorist Organisation

This article in The Age about Mohamed Haneef shows the terrorist threat that we face.

The chance that I will be injured by Al Quaeda in any way is quite remote. The chance of being attacked by ASIO is a lot greater.

The main benefit of being in a democracy is having a legal system where the defendant is presumed innocent until proven guilty and where they have the right to legal representation. The war in Iraq has not brought the US or Australian system of government to Iraq, instead it is bringing Saddam Hussein’s system of government to Australia and the US.

Traditionally under the Australian and US legal systems innocent people are not punished, unlike under Saddam Hussein. Now ASIO has the authority to detain innocent civilians indefinitely if they believe that it helps them in some way – and there is no method of policing ASIO to ensure that even such excuses are met.

Traditionally under the Australian and US legal systems everyone who is accused of a crime is entitled to a trial, unlike under Saddam Hussein. Now ASIO and the CIA have been given the authority to punish anyone without a trial. ASIO can also extend the punishment to anyone who might receive evidence of such actions and publish it (I guess that the CIA can do the same).

Saddam lost the battle but his legacy is winning the war.

For the best definition of Terrorism see Noam Chomsky’s paper. The actions taken by the Australian government against the people of Iraq, foreign citizens in Australia, and almost certainly Australian citizens (it’s not credible to believe that ASIO has such powers and doesn’t use them occasionally on Australians) fits the definition of Terrorism.

Elections are coming soon, both in the US and in Australia. Whatever you do, don’t vote for Neo-Cons (Republicans in the US or Liberals in Australia).

PS Before anyone suggests that I should worry about ASIO kidnapping me in retaliation for this, I’m sure that they know of the Streisand Effect. I’ll try and avoid any unplanned down-time for my blog after this post goes out to avoid false-alarms… ;)

Update: I incorrectly wrote “guilty until proven innocent” above, that is the current Australian government policy not the way it should be.

6

Column Width in Blogs

I have just been reading the LinuxWorld Community blog which seems to be mostly Don Marti’s personal blog (currently there seems to be no-one else blogging on that site).

One thing that disappointed me is that the theme designer made it look good at a width of 1000 pixels and no other size. At a smaller width the adverts on the right are cut off (more of a problem for the site owner than for the readers) and at a larger width you have a thin column of text in the middle of the screen. A quick test revealed that while my own blog looks good in wide windows it doesn’t work too well in 800 pixel width and gets very bad at lower widths – my blog would be essentially unusable at 640×480 resolution as the text column in the middle (the most important column) is the one that reduces in size. The LinuxWorld blog has a minimum size of 1000 pixels for the scaling so it allows horizontal scrolling in 640 pixel width and remains quite readable.

The top entry in a google search for web size stats is Browser News which claims that 12% of web browsers are on 800 pixel wide screens. The next link I found claims that as of January 2007 there are 26% of web users with screens that have higher than 1024×768 resolution and 14% with 800×600.

Apart from the first couple of months of blogging my blog has always looked good in screens greater than 1000 pixels wide, but not having it work at 800×600 is a problem. The first thing I did was alter the style.css file for the Blueline theme for WordPress to use 100% of the display width (not 86%). Wasting 14% of the screen width is not a good thing when using a width-intensive three-column theme. This change made my blog work well in 800 pixel width and be bearable in 640 pixel width.

The other change was to use min-width: 700px; in the style.css sections blogtitle, container, and navigation. This means that at 640 pixel width the text column will take more than 1/3 of the screen and should be quite readable (unless the reader has an unusually large font setting). The down-side to this is that if your window width is less than 700 pixels then you will have some horizontal scrolling, but I think that this is an acceptable trade-off.

I was forced to confront this issue when talking to a prospective client about the potential for blogs to be used in his business, he loaded up my blog on an ancient windows machine and it didn’t look very good at all, this coincidentally happened a few hours after I had been reading the LinuxWorld blog on a big screen.

1

Correspondent Inference Theory and the US

Bruce Schneier writes about Correspondent Inference Theory which deals with situations when the motives of an individual or group are inferred by the results of their actions. Both his article and the MIT article on which it is based only consider the results of terrorist actions against the US and allied countries.

I believe that this is a serious mistake by Bruce, the MIT people, and most people who write about terrorism. The most sensible writing about Terrorism is by Noam Chomsky. Noam considers the definition of Terrorism in both propaganda and literally. By the literal definition of terrorism the US government is responsible for more than it’s fair share of terrorist acts performed around the world.

There is no reason to believe that people in the Middle-East are any less intelligent than people in the US and Europe. It seems obvious that some of the people who’s countries are destroyed by violence sponsored by the US government will believe that the US is entirely inhabited by blood-thirsty monsters. The number of US citizens who realise what their government does and approve is very low as is the number of Muslims who know what Al Quaeda does and approve of it.

The US government claims that it wants democracy in the Middle-East, and Osama bin Laden claims to want the US military out of the Middle-East. If the US forces were withdrawn from Saudi Arabia then it would probably lead to a significant increase in democracy in the region (it couldn’t get any less democratic) – both sides could get what they claim to want.

The discussion of the MIT paper seems to be largely based on the fact that Correspondent Inference causes the US government (and other governments) to decrease the probability of doing anything that might meet the terrorist goals. But no-one has mentioned the possibility that the same may apply to the probability of non-state organisations doing anything that might meet the goals of the US government. The wars in Iraq and Iran have significantly decreased the capabilities of the US military, they can’t recruit enough new soldiers and the current soldiers have reduced effectiveness due to long tours of duty with short breaks. The US economy is stagnating partly due to the direct effects of financing the wars, partly due to the way the airline security theatre has hurt trade and tourism, and partly because everyone has been concentrating on other things instead of fixing the economy.

When two states have a war there is always the possibility of it being ended by a peace treaty or one side surrendering. With modern communications fighting can end in a matter of hours after a cease-fire has been arranged between states. But when non-state forces are involved things become much more difficult to manage. A state can make a deal with one non-state group only to discover that another non-state group (or a dissident faction within the original group) doesn’t like the treaty and continues fighting. With non-state terrorist acts connected to Al Quaeda in the US, the UK, Spain, and Indonesia (and more acts apparently planned in other countries) it’s obvious that we aren’t going to get a clean or quick solution to this problem.

It seems to me that the only way the US and allied countries can escape from Correspondent Inference is to withdraw from the Middle-East entirely. If the people of Iran or Palestine want to elect a government that you don’t like then let it go (that’s what democracy is about anyway). If a dictator seizes control of Iraq then either leave him in control or provide air-support to any province that wants to rebel and establish a democratic government. Either make a stand on the principle of support for freedom and democracy or do nothing on the principle of letting people in other countries sort out their own problems. An invasion for the wrong reasons might fool people on the other side of the world but is unlikely to fool many people who live in the target country.

Testing STONITH

One problem that I have had in configuring Heartbeat clusters is in performing a STONITH that originates outside the Heartbeat system.

STONITH was designed for the Heartbeat system to know when a node is not operating correctly (this can either be determined by the node itself or by other nodes in the network) and then force a hardware reset so that the non-functional node will not interfere with another node that is designated to take over the service.

However sometimes code that is called by Heartbeat will have more information about the state of the system than Heartbeat can access. For example if I have a service that accesses a filesystem on an external RAID then it’s common for the RAID to track who is accessing it. In some situations the RAID hardware has the ability to “fence” the access (so that when machine B mounts the filesystem machine A can no longer access it). In other situations the RAID may only be capable of informing the system that another machine is registered as the owner of the device. To solve this problem a machine that is to mount such a device must either prohibit the previous owner from accessing the device (which may be impossible or unreasonably difficult) or reset the previous owner.

Until recently I had been doing this by writing some code to extract the STONITH configuration from the CIB and call the stonith utility. The problem with this is that there is no requirement that every node be capable of performing a STONITH on every other node, and that even if every node is are designed to be capable of rebooting every other node a partial failure condition may restrict the set of nodes that are capable of performing a STONITH on the target.

Currently the recommended way of doing this is via the test program. Below is an example of the command used to reset the node node-1 with a timeout of 20000ms and the result of it being successfully completed. I have suggested that the Heartbeat developers make an official interface for doing this (rather than a test of the API) and I believe that this is being considered. In the mean time the following is the only way of doing it:

# /usr/lib/heartbeat/stonithdtest/apitest 1 node-1 20000 0
optype=1, node_name=node-1, result=0, node_list=node-0

3

Documentaries about Gifted Children

On several occasions I have watched part of a TV documentary on gifted children, but I have never been able to watch one completely because every one that I have seen has been offensively wrong.

One thing that they always seem to do is say that gifted children have special needs and often claim that they have problems socialising. This sounds quite reasonable, but if that’s the case then why would you make such children perform tricks on TV? Putting the children on TV is a very poor example of journalism and often of parenting – the parents’ desire to boast about their children’s performance (and implicitly their own parenting skills) is apparently more important than protecting the children. I think that the only situation in which gifted children should have their talents demonstrated on TV is if their skill is related to the performing arts. Not that TV coverage is necessarily good for children who have abilities related to performing (a casual scan of the news regarding adults in Hollywood shows the problems that people have dealing with fame), but it’s something that they will be driven to anyway.

Also if they are going to demonstrate the intelligence of a child on TV then they really should make sure that they don’t demonstrate a lack of intelligence (on the part of the child as well as the producers). Asking a child to provide a definition of a word is often used as an example of intelligence (when it’s really an example of vocabulary). In one documentary a child defined a philanthropist as “someone who has a lot of money” (according to dict on my system it is “Love to mankind; benevolence toward the whole human family; universal good will; desire and readiness to do good to all men“). In another a child defined a genius as “someone who knows a lot“, while the definition of genius is not clear and there is some disagreement about what it is, most people agree that it’s about ability not knowledge. Being able to recognise when you don’t know something and admit would surely be correlated with intelligence…

Telling a child that they are a genius or telling them their IQ seems like a bad idea at the best of times, and doing so in front of a documentary camera crew isn’t the best of times. Children are able to determine how their skills compare to others, there are more than enough attitude problems in schools related to skill comparisons without encouragement from adults. When I was in high school a friend who studied a martial art refused to tell me which belt he wore – he had been taught that such things shouldn’t be discussed outside the dojo to avoid creating a hierarchy based on belt rating in the community. I think that the same thing could be applied to IQ ratings.

The term genius is grossly overused generally, I believe that showing greater than average ability in one area is not enough to qualify. I think that the minimum criteria would be to produce dramatic new developments/inventions in one area of research (EG Albert Einstein or Stephen Hawking) or to advance multiple fields of science or art (EG Leonardo da Vinci) would be the minimum criteria. I had people call me a genius when I was young because I won prizes in maths competitions, was good at programming computers, playing chess, etc. The main criteria for achieving such things is to avoid wasting time on sport – in a school run by idiots with a focus on sport this was more a tribute to being stubborn than being smart!

The TV documentaries mention many things that gifted children supposedly require, but strangely I don’t recall any of them mentioning the need to meet adults who are significantly smarter than average! You might think that this is the most obvious thing. The Big Brother and Big Sister programs of mentoring children who are at risk of crime and drugs apparently provide significant benefits, something similar for gifted children might do some good, probably more based on meetings of a few intelligent adults with a small group of intelligent children instead of the BB-BS model of one-on-one meetings. Washingtonienne seems like a good example of the need for this.

There are some useful print articles however, this article in The Age makes some good points, it’s particularly interesting to note that some schools lie about special programs to support gifted children to attract students (someone should name the schools that do this).

But then you get awful ones like this, naming the child in such a situation is irresponsible journalism. The girl in question may be forced to change her name to escape google when she’s older. It’s strange that it’s illegal to name a child who is involved in a court case but it’s not illegal to name them in such an article (let’s hope that the journalist used pseudonyms). Also you might expect an organisation such as Mensa to have someone smart enough to realise that bringing such attention on a 2yo is not in the child’s best interests.

Does anyone know of a good TV documentary about gifted children? I guess that it would be extremely difficult to make one without showing the children on TV so that would restrict the film maker to children who’s abilities are related to the performing arts.

5

Desktop Machines and ECC RAM

In a comment on my post about memory errors Chris Samuel referred me to an interesting post on the Beowulf mailing list about memory errors. In that list posting Joe Landman says “it is pretty easy to deduce which chip is problematic (assuming it is ram) based upon the address” and then describes how to use Machine Check Exception (MCE) data from an error detected/corrected by the ECC system.

Damn the vendors of motherboards for switching to 8-bit RAM just when it was about to be useful to have 9-bit RAM!

286 class machines had 9 bits of RAM per byte with one bit used for parity. Parity errors were extremely rare, largely due to the fact that memory errors could affect more than one bit at a time and therefore would often give a correct parity – if multiple bit errors were totally random then parity might be expected to pass 50% of the time! The Pentium was the first commonly used CPU to operate with a 64bit memory bus. If it had 9 bits per byte it would have had 72 bit wide memory buses – a Hamming Code could use this to detect and correct single-bit errors, detect all double-bit errors, and detect some errors involving more bits. This would mean that some errors would be recoverable and would display the location of the memory problem instead of being fatal and giving no information.

Now it’s become a standard feature in servers to have ECC memory (at significantly greater cost) and most desktop machines don’t have ECC support – I wonder whether this is aimed at price-gouging people who need reliable servers (they can’t use cheap RAM from desktop machines).

Unfortunately due to issues of electricity use, noise, and price I have to run all the servers that are most important to me on desktop PC hardware. Is anyone selling ECC RAM in desktop systems? I am particularly interested in machines that are a couple of years old so I can get them cheap at auction…

1

Questions During Lectures

An issue that causes some discussion and debate is the number and type of questions that may be asked during a lecture. In a previous post giving advice for speakers I suggested that questions can be used as a mechanism for getting a talk back on track if a nervous speaker starts presenting the material too quickly (a common mistake). This mechanism can be used by the speaker if they realise that things aren’t going to plan or by audience members who are experienced speakers and who recognise a problem. Due to this a blanket ban on questions during a talk will only work with experienced speakers who have planned their talk well.

There are different styles of presentation favoured by different speakers. Some are determined by the nature of the topic (an example that I have seen cited is topics that are very contentious which would lead to a debate if questions were permitted during the talk), but for computer science I think that questions during the talk can always work well. To a certain extent the fact that code either works or doesn’t limits the scope for debate.

Probably the major factor that determines the utility of questions is the size of the audience. If you have an audience of less than 50 people then a conversational approach can work, if you have less than 200 people then a reasonable number of questions can be accepted. But as the audience size increases above 300 the utility of questions approaches zero. If the majority of people who might want to ask questions are unable to do so due to lack of time then the value of allowing any questions diminishes. For the largest audiences there probably isn’t any point in having question time.

Another major factor determining which style works best is where the speaker has had experience speaking. Most of my speaking experience is with less formal meetings (such as local LUGs) and in countries with an informal attitude towards such things (Australia, the US, and The Netherlands). A speaker who has primarily spoken for universities such as Cambridge or Oxford (which seem to have a very formal style and questions strictly reserved for the end) or who has come from a country such as Japan(*) where it’s reported that the audience are obliged to show respect for the speaker by being quiet will probably expect questions only at the end and may flatly reject questions during the talk. A speaker who has a background speaking for less formal audiences will expect a certain number of questions during the course of the talk and may plan the timing of their talk with this in mind. When I plan a talk for a one hour slot I plan at most 30 minutes of scheduled talking (IE covering my notes) expecting that there will be 15 minutes of questions along the way and another 15 minutes of questions at the end. Often with such plans my talks run over-time. Of course this means that people who have mostly had experience speaking to smaller and less formal audiences will find it exceedingly difficult to give talks to larger audiences. This may be an incentive for having more formal arrangements for LUG talks to increase the skills of speakers.

The next issue is what level of contentious questions is acceptable. I believe that if you have a disagreement with points that the speaker is making (and have some experience in the field in question) and the audience is not particularly large then one hostile question is acceptable – as a speaker it’s reasonable to refuse to take any further questions from an audience member who has asked one hostile question. Another category of question is the challenging question (not to be confused with a hostile question), for example describing in one sentence what your business requirements are and asking how the topic being discussed will apply to that business requirement. One of the most useful questions I have been asked during a SE Linux talk was concerning the issue of backups of file security contexts, it was presented in a challenging way and the answer that I gave was not nearly as good as I could give now (the code base has improved over the last few years in this regard) – but I think that everyone learned something so that validated the question.

In smaller groups there may be some heckling when the speaker is a well known member of the group, I don’t think that this is a problem either as long as it only consumes a tiny fraction of the time (maybe 20-30 seconds at the start). For larger audiences or for speakers who don’t know the audience well heckling is generally a bad idea.

(*) When speaking in Japan I had a lot of audience interaction. I’m not sure if this is an indication of the Japanese culture changing in this regard, the fact that translation problems forced some interaction, or the audience was showing respect for the Australian culture by asking questions.