6

How SE Linux Prevents Local Root Exploits

In a comment on my previous post about SE Linux and worms/trojans [1] a user enquired about which methods of gaining local root are prevented by SE Linux.

A local exploit is one that can not be run remotely. An attack via TCP or UDP is generally considered a remote exploit – even though in some cases you might configure a daemon to only bind to localhost (in which case the TCP or UDP attack would only work locally). When compromising a machine it’s not uncommon for a local exploit to be used after a remote exploit or social engineering has been used to gain non-root privileges.

The two most common types of local root exploit seem to be those which attack the kernel and those which attack a SETUID process. For a non-SE Linux system it usually doesn’t matter much how the local exploit is run. But a SE Linux system in a default configuration will be running the Targeted policy that has almost no restrictions on a user shell session. So an attacker who wants to escalate their privileges from local user to local root on a typical SE Linux system has a significant benefit in starting from a user account instead of starting from a web server or other system process.

In the SE Linux model access is granted to a domain, and the domain which is used for a process is determined by the policy based on the domain of the parent process and the labelling of the executable. Some domains are not permitted to transition to any other domains, such as the domain dhcpd_t (used for a DHCP server). Other domains are only permitted to transition to a small set of domains, for example the domain httpd_t (used for a web server) can only transition to a small number of domains none of which has any significant privileges.

On a machine running without SE Linux a compromise of a DHCP server is game-over as the server runs as root. A compromise of a daemon such as Apache on a machine without SE Linux gives unrestricted access to run applications on the systems – if a SETUID-root program has a security flaw then you lose. The same bug in a SETUID program on a machine running SE Linux is not fatal because SE Linux will prevent the program from doing anything that it’s parent could not do – even if an attacker made Apache run a buggy SETUID program the broken program in question could do nothing other than what Apache is normally permitted to do.

A security flaw in a SETUID-root program on a SE Linux system can still be exploited by a local user (someone who has logged in) when running the Targeted policy. When running the Strict or MLS policies many such vulnerabilities will not be exploitable by local users (for example exploiting PING would only permit network access).

As a rule of thumb you should consider that a kernel security flaw will make it possible to bypass all other security features. However there are some situations where SE Linux can prevent local exploits. One example is a bug discovered in July 2006 which allowed the creation of SETUID files under /proc [2], the targeted policy of SE Linux prevented this from working. Another example is CVE-2003-0127 [3] (a kernel security flaw that was exploited by triggering a module load and then exploiting a race condition in the kernel module load process), the commonly used exploit for this did not work on a SE Linux system because the user process was not permitted the socket access used to trigger the module load (it is believed that an attacker could have written a version of the exploit to work on a SE Linux system – but AFAIK no-one ever did so).

14

A Long Laptop Lifetime

Paul Russell writes about his 3-yearly laptop replacement at IBM [1]. It probably makes some sense to replace laptops periodically for a large company, but if you are buying for personal use then it makes sense to try and get a longer life out of an expensive machine. I think that aiming for 6 years is quite reasonable with today’s hardware – you should be able to buy a new machine now and have it last 6 years or buy a 3yo second-hand machine and hope to have it last 3 years (most second-hand laptops on sale in every place other than Ebay were trophies for managers and never had any serious use).

If you are going to buy a second-hand laptop the first thing to consider is PAE support. If you get a laptop without PAE support (I think that means all Pentium-M CPUs) then you will not have proper Xen support (it seems that all distributions have abandoned Xen support for PAE for the moment). This may not be a big deal if you don’t want Xen, but if you are a programmer then you probably do want Xen (even if you don’t realise it yet). The next issue is support for the AMD64 instruction set. 32bit laptops are going cheap at the moment but if you buy one you will be significantly limited as to what software you can run at some future time (my 32bit laptop is doing well at the moment apart from the lack of PAE support).

If you are buying a new laptop then the first thing to consider when planning a long life is the warranty. In my experience most computer gear does not need a long warranty, if it survives 3 months then it’ll probably last until it’s well obsolete. Laptops however periodically wear out if used seriously, I average one warranty replacement of a Thinkpad keyboard every two years and I have had a few motherboard replacements (the lighter Thinkpads flex and they eventually break inside if you use them on enough trains, trams, buses, planes, etc). On one of the Lenovo T series Thinkpads that I saw advertised (one that I would consider if I wanted a new laptop now) there was an offer to spend an extra $350AU to get extend the warrantee from 1 year to 5 years (according to my understanding of the confusing text on the web site) on a laptop that cost $3050. An increase in the purchase price of 12% for the extra warranty is a bargain (I know that for my use they would lose money on the deal). Repair of a laptop is generally very expensive, any serious damage to a laptop that is more than 18 months old will generally mean that the replacement cost is less than the repair cost.

The next thing to consider is the screen resolution. After purchasing a laptop you can upgrade the RAM and the hard drive, the CPU power of all modern machines is great enough that for most typical use it’s difficult to imagine any need to upgrade. But screen resolution is something that can never be good enough and can never be improved after purchase. Lenovo is offering T series Thinkpads with 1920×1200 resolution for $4000AU and 1680×1050 resolution for $3050AU. That’s 31% more pixels for 31% more money and seems like a good deal to me. I believe that a larger display can significantly increase productivity [2] so it seems that the extra expense would be a good investment if you plan to earn money from work you do on your laptop. As a point of reference a desktop monitor from Dell (who seems to be the cheapest supplier for such gear) with resolution of 1920×1200 will cost at least $1000AU.

The hard drive capacity should not be an issue, it seems that 100G is about the minimum size. The 60G drive in my current Thinkpad is adequate for my development work (including several Xen instances and some ISO files for a couple of distributions) so unless you plan to collect MPEG4 files of TV series and store them on your laptop I can’t imagine 100G being much of a limit. Also external storage is getting quite cheap, 2GB USB flash devices are now in the bargain bin of my local electronics store and USB attached hard drives with capacities of 40G or more are getting cheap. Also with a Thinkpad replacing a hard drive is really easy and does not risk damage to the drive or the rest of the laptop (I don’t know how well other brands rate in this regard).

For RAM you can buy a model with a large memory module in socket 0 (or attached to the motherboard). Adding new RAM later is easy to do. Just try and avoid purchasing a memory capacity that involves having all sockets filled with modules that are not of the maximum size – it’s annoying if you have to try and sell modules on Ebay after you buy a memory upgrade.

Finally one mistake I made in the past was to not get all the options for the motherboard. Make sure that every option for Ethernet ports and 802.11 type protocols is selected. It might sound like a good idea to save $100 or so on not getting one of those options, but if you end up repeatedly plugging a CardBus or USB device for many years you will regret it. Also external devices tend to break or get lost.

Rusty documents his laptop replacement as a time for spring-cleaning. I use LVM for the root filesytem on my Thinkpad so that I can easily install a new distribution (or a new version of a distribution) at any time. I’ve been through that spring-cleaning a couple of times on my current Thinkpad without needing new hardware.

From a quick view of the Lenovo site it seems that an ideal new Thinkpad that would last me 6 years would cost about $4500 while one that would last me 2 years would cost $1600 (and have a significantly lower screen resolution). A Thinkpad that would last 6 years and not be so great (but still better than the cheap option) would cost about $3500.

Update: One significant issue is the life expectancy of laptop batteries. If you use a laptop for mobile use (as opposed to just moving between desks occasionally) then you are probably familiar with the problem of laptop batteries that discharge after 10 minutes. Last time I checked the warranty on Thinkpad batteries was 1 year or 300 charges (whichever comes first). My experience is that after 300 full cycles a Thinkpad battery will only last for a small fraction of the original charge time. When buying a laptop I suggest getting a spare battery at the time of purchase. The spare battery may last longer than the battery that is shipped with the laptop and two batteries means that you have twice the number of charge cycles before they are both useless. Batteries apparently don’t last long if completely discharged, so charge them up before storing them and periodically charge them if they have been left unused for any length of time (maybe every second or third month). With a Thinkpad it seems quite safe to change the battery while the machine is plugged in to mains power and running (I expect that Lenovo doesn’t recommend this though). You should probably plan to have a battery die every three years of use (or sooner if you do a lot of travelling). So one spare battery may last you 6 years of use but you will need two spare batteries if you travel a lot.

8

Can SE Linux Stop a Linux Storm

Bruce Schneier has just written about the Storm Worm [1] which has apparently been quietly 0wning some Windows machines for most of this year (see the Wikipedia page for more information [2]).

I have just been asked whether SE Linux would stop such a worm from the Linux environment. SE Linux does prevent many possible methods of getting local root. If a user who does not have the root password (or is not going to enter it from a user session) has their account taken over by a hostile party then the attacker is not going to get local root (unless there is a kernel vulnerability). Without local root access the activities of the attacker can be seen by a user logged in on another account – processes will be seen by all user sessions if using the SE Linux targeted policy and files and processes can be seen by the sys-admin.

If while a user account is 0wned the user runs “su –” (or an equivalent command) then in theory at least the attacker can sniff this and gain local root access (whether enough users do this to make attackers feel that it’s worth their effort to write the code in question is something I couldn’t even guess about). If the user is clueless then the attacker could immediately display a dialog with some message that sound urgent and demand the root password – some users would give it. If the user is even moderately smart the attacker could fake the GUI dialogues for installing updated packages (which have been in Red Hat distributions for ages and have appeared in Debian more recently) and tell the user that they need to enter the root password to install an important security update (oh the irony).

In conclusion I think that if a user is ill-educated enough to want to run a program that was sent to them in email by a random person then I expect that the program would have a good chance of coercing them into giving it local root access if the user in question had the capability of doing so.

Even if a Linux trojan did not have local root access then it could still do a lot of damage. Any server operations that don’t require ports <1024 (which means most things other than running a web, DNS, or mail server) can still be performed and client access will always work (including sending email). The trojan would have access to all of the user’s data (which for a corporate desktop machine usually means a huge network share of secret documents).

If a trojan only attempts to perform actions that SE Linux permits (running programs from the user’s home directory, accessing servers for DNS, HTTP, IRC, SMTP, and other protocols – a reasonable set of options for a trojan) then the default configuration of SE Linux (targeted policy) won’t stop it or even log anything. This is not a problem with SE Linux, just a direct result of the fact that in every situation a trojan can perform all operations that the user can perform – and if the trojan only wants to receive commands via web and IRC servers and send spam via the user’s regular mail server then it will be a small sub-set of the permitted operations for the user!

If however the trojan tries more aggressive methods then SE Linux will log some AVC messages about access being denied. If the sys-admin has good procedures for analysing log files they will notice such things, understand what they mean, and be able to contain the damage. Also there have been at least two cases where SE Linux prevented local root exploits.

Finally, in answer to the original question: SE Linux will stop some of the more aggressive methods that trojans might use. But there are still plenty of things that a trojan could do to cause harm which won’t be stopped or audited by SE Linux policy. When Linux gets more market share among users with a small amount of skill and no competent person to do sys-admin work for them we will see some Linux trojans and more Linux worms. It will be interesting to see what methods the trojan authors decide to use.

4

Executable Stack and Shared Objects

When running SE Linux you will notice that most applications are not permitted to run with an executable stack. One example of this is libsmpeg0 which is used by the game Freeciv [1]. When you attempt to run the Freeciv client program on a Debian/Etch system with a default SE Linux configuration (as described in my post on how to install SE Linux on Debian in 5 minutes [2]) then you will find that it doesn’t work.

When this happens the following will be logged to the kernel log and is available through dmesg and usually in /var/log/kern.log (Debian/Etch doesn’t have auditd included, the same problem on a Fedora, RHEL, or CentOS system in a typical configuration would be logged to /var/log/audit/audit.log):
audit(1191741164.671:974): avc: denied { execstack } for pid=30823 comm=”civclient” scontext=rjc:system_r:unconfined_t:s0 tcontext=rjc:system_r:unconfined_t:s0 tclass=process

The relevant parts are in bold. The problem with this message in the message log is that you don’t know which shared object caused the problem. As civclient is normally run from the GUI you are given no other information.

So the thing to do is to run it at the command-line (the avc message tells you that civclient is the name of the failing command) and you get the following result:
$ civclient
civclient: error while loading shared libraries: libsmpeg-0.4.so.0: cannot enable executable stack as shared object requires: Permission denied

This makes it clear which shared object is at fault. The next thing to do is to test the object by using execstack to set it to not need an executable stack. The command execstack -q /usr/lib/libsmpeg-0.4.so.0.1.4 will give an “X” as the first character of the output to indicate that the shared object requests an executable stack. The command execstack -c /usr/lib/libsmpeg-0.4.so.0.1.4 will change the shared object to not request an executable stack. After making such a change to a shared object the next thing to do is to test the application and see if it works correctly. In every case that I’ve seen the shared object has not needed such access and the application has worked correctly.

As an aside, there is a bug in execstack in that it will break sym-links. Make sure that the second parameter it is given is the shared object not the sym-link to it which was created by ldconfig. See Debian bug 445594 [3] and CentOS bug 2377 [4].

The correct thing to do is to fix the bug in the source (not just modify the resulting binary). On page 8 of Ulrich Drepper’s document about non-SE Linux security [5] there is a description of both the possible solutions to this problem. One is to add a line containing “.section .note.GNU-stack,"",@progbits” to the start of the assembler file in question (which is what I suggested in Debian bug report 445595 [6]). The other is to add the parameters “-Wa,--execstack” to the command-line for the GNU assembler – of course this doesn’t work if you use a different assembler.

In the near future I will establish an apt repository for Debian/Etch i386 packages related to SE Linux. One of the packages will be a libsmpeg0 package compiled to not need an executable stack. But it would be good if bug fixes such as this one could be included in future updates to Etch.

10

Ideas for a Home University

There seems to be a recent trend towards home-schooling. The failures of the default school system in most countries are quite apparent and the violence alone is enough of a reason to keep children away from high-schools, even without the education (or lack therof).

I have previously written about University degrees and whether they are needed [1].

The university I attended (which I won’t name in this context) did an OK job of teaching students. The main thing that struck me was that you would learn as much as you wished at university. It was possible to get really good marks without learning much (I have seen that demonstrated many times) or learn lots of interesting things while getting marks that are OK (which is what I did). So I have been considering whether it’s possible to learn as much as you would learn at university without attending one, and if so how to go about it.

Here are the ways I learned useful things at university:

  1. I spent a lot of time reading man pages and playing with the various Unix systems in the computer labs. It turned out that sys-admin work was one of my areas of interest (not really surprising given my history of running Fidonet BBS systems). It was unfortunate that my university (like almost all other universities) had no course on system-administration and therefore I was not able to get a sys-admin job until several years after graduating.
  2. I read lots of good text books (university libraries are well stocked).
  3. There were some good lectures that covered interesting material that I would not have otherwise learned (there were also some awful lectures that I could have missed – like the one which briefly covered computer security and mentioned NOTHING other than covert channels – probably the least useful thing that they could cover).
  4. I used to hang out with the staff who were both intelligent and friendly (of which there were unfortunately a small number). If I noticed some students hanging out in the office of one of the staff in question I would join them. Then we would have group discussions about many topics (most of which were related to computers and some of which were related to the subjects that we were taking), this would continue until the staff member decided that he had some work to do and kicked us out. Hanging out with smart students was also good.
  5. I did part-time work teaching at university. Teaching a class forces you to learn more about the subject than is needed to basically complete an assignment. This isn’t something that most people can do.

I expect that Children who don’t attend high-school will have more difficulty in getting admitted to a university (the entrance process is designed for the results of high-school). Also if you are going to avoid the public education system then it seems useful to try and avoid it for all education instead of just the worst part. Even for people who weren’t home-schooled I think that there are still potential benefits in some sort of home-university system.

Now a home-university system would not be anything like an Open University. One example of an Open University is Open Universities Australia [2], another is the UK Open University [3]. These are both merely correspondence systems for a regular university degree. So it gives a university degree without the benefit of hanging out with smart people. While they do give some good opportunities for people who can only study part-time, in general I don’t think that they are a good thing (although I have to note that there are some really good documentaries on BBC that came from Open University).

Now I am wondering how people could gain the same benefits without attending university. Here are my ideas of how the four main benefits that I believe are derived from university can be achieved without one (for a Computer Science degreee anyway):

  1. Computers are cheap, every OS that you would ever want to use (Linux, BSD, HURD, OpenSolaris, Minix, etc) is free. It is quite easy to install a selection of OSs with full source code and manuals and learn as much about them as you desire.
  2. University libraries tend not to require student ID to enter the building. While you can’t borrow books unless you are a student or staff member it is quite easy to walk in and read a book. It may be possible to arrange an inter-library loan of a book that interests you via your local library. Also if a friend is a university student then they can borrow books from the university library and lend them to you.
  3. There are videos of many great lectures available on the net. A recent resource that has been added is Youtube lectures from the University of California Berkely [4] (I haven’t viewed any of the lectures yet but I expect them to be of better than average quality). Some other sources for video lectures are Talks At Google [5] and TED – Ideas Worth Spreading [6].
  4. To provide the benefits of hanging out with smart people you would have to form your own group. Maybe a group of people from a LUG could meet regularly (EG twice a week or more) to discuss computers etc. Of course it would require that the members of such a group have a lot more drive and ambition than is typical of university students. Such a group could invite experts to give lectures for their members. I would be very interested in giving a talk about SE Linux (or anything else that I work on) to such a group of people who are in a convenient location.
  5. The benefits of teaching others can be obtained by giving presentations at LUG meetings and other forums. Also if a group was formed as suggested in my previous point then at every meeting one or more members could give a presentation on something interesting that they had recently learned.

The end result of such a process should be learning more than you would typically learn at university while having more flexible hours (whatever you can convince a group of like-minded people to agree to for the meetings) that will interfere less with full-time employment (if you want to work while studying). In Australia university degrees don’t seem to be highly regarded so convincing a potential employer that your home-university learning is better than a degree should not be that difficult.

If you do this and it works out then please write a blog post about it and link to this post.

Update:
StraighterLine offers as much tuition as you can handle over the Internet for $99 per month [7]. That sounds really good, but it does miss the benefits of meeting other people to discuss the work. Maybe if a group of friends signed up to StraighterLine [8] at the same time it would give the best result.

39

Swap Space

There is a wide-spread myth that swap space should be twice the size of RAM. This might have provided some benefit when 16M of RAM was a lot and disks had average access times of 20ms. Now disks can have average access times less than 10ms but RAM has increased to 1G for small machines and 8G or more for large machines. Multiplying the seek performance of disks by a factor of two to five while increasing the amount of data stored by a factor of close to 1000 is obviously not going to work well for performance.

A Linux machine with 16M of RAM and 32M of swap MIGHT work acceptably for some applications (although when I was running Linux machines with 16M of RAM I found that if swap use exceeded about 16M then the machine became so slow that a reboot was often needed). But a Linux machine with 8G of RAM and 16G of swap is almost certain to be unusable long before the swap space is exhausted. Therefore giving the machine less swap space and having processes be killed (or malloc() calls fail – depending on the configuration and some other factors) is probably going to be a better situation.

There are factors that can alleviate the problems such as RAID controllers that implement write-back caching in hardware, but this only has a small impact on the performance requirements of paging. The 512M of cache RAM that you might find on a RAID controller won’t make that much impact on the IO requirements of 8G or 16G of swap.

I often make the swap space on a Linux machine equal the size of RAM (when RAM is less than 1G) and be half the size of RAM for RAM sizes from 2G to 4G. For machines with more than 4G of RAM I will probably stick to a maximum of 2G of swap. I am not convinced that any mass storage system that I have used can handle the load from more than 2G of swap space in active use.

The reason for the myths about swap space size are due to some old versions of Unix that used to allocate a page of disk space for every page of virtual memory. Therefore having swap space less than or equal to the size of RAM was impossible and having swap space less than twice the size of RAM was probably a waste of effort (see this reference [1]). However Linux has never worked this way, in Linux the virtual memory size is the size of RAM plus the size of the swap space. So while the “double the size of RAM” rule of thumb gave virtual memory twice the size of physical RAM on some older versions of Unix it gave three times the size of RAM on Linux! Also swap spaces smaller than RAM have always worked well on Linux (I once ran a Linux machine with 8M of RAM and used a floppy disk as a swap device).

As far as I recall some time ago (I can’t remember how long) the Linux kernel would by default permit overcommitting of memory. For example if a program tried to malloc() 1G of memory on a machine that had 64M of RAM and 128M of swap then the system call would succeed. However if the program actually tried to use that memory then it would end up getting killed.

The current policy is that /proc/sys/vm/overcommit_memory determines what happens when memory is overcommitted, the default value 0 means that the kernel will estimate how much RAM and swap is available and reject memory allocation requests that exceed that value. A value of 1 means that all memory allocation requests will succeed (you could have dozens of processes each malloc 2G of RAM on a machine with 128M of RAM and 128M of swap). A value of 2 means that a different policy will be followed, incidentally my test results don’t match the documentation for value 2.

Now if you run a machine with /proc/sys/vm/overcommit_memory set to 0 then you have an incentive to use a moderately large amount of swap, safe in the knowledge that many applications will allocate memory that they don’t use, so the fact that the machine would deliver unacceptably low performance if all the swap was used might not be a problem. In this case the ideal size for swap might be the amount that is usable (based on the storage speed) plus a percentage of the RAM size to cater for programs that allocate memory and never use it. By “moderately large” I mean something significantly less than twice the size of RAM for all machines less than 7 years old.

If you run a machine with /proc/sys/vm/overcommit_memory set to 1 then the requirements for swap space should decrease, but the potential for the kernel to run out of memory and kill some processes is increased (not that it’s impossible to have this happen when /proc/sys/vm/overcommit_memory is set to 0).

The debian-administration.org site has an article about a package to create a swap file at boot [2] with the aim of making it always be twice the size of RAM. I believe that this is a bad idea, the amount of swap which can be used with decent performance is a small fraction of the storage size on modern systems and often less than the size of RAM. Increasing the amount of RAM will not increase the swap performance, so increasing the swap space is not going to do any good.

1

Gear Acquisition Syndrome

I have just read an interesting post about Gear Acquisition Syndrome [1] as applied to the guitar industry. Apparently it’s common for people to spend a lot of time and money buying guitar equipment instead of actually playing a guitar. I think that this problem extends way beyond guitars and to most aspects of human endeavour, and that actively trying to avoid the problem is a key to getting things done. I believe that the author however makes a strategic error by then going on to advise people on how to buy gear that won’t become obsolete. Sure it’s good to have gear that will suit your future needs and not require replacement, but if you are repeatedly buying new gear then your problem usually is not that the gear doesn’t do the job but that you want to buy more.

I used to suffer from this problem to a degree with my computer work, and still have problems controlling myself when I see tasty kit going cheap on auction.

Here is a quick list of things to do to avoid GAS:

  • Recognise the problems with getting new gear. It costs money (thus requiring you to do more paid work or skip something else that you enjoy). It needs to have the OS installed and configured which takes time away from other things (unless your job is related to installing software on new machines). Finally it might be flawed. Every time you buy a new computer you risk having a failure, if it’s a failure that happens some time after deploying the machine then it can cause data loss and down-time which is really annoying.
  • Keep in mind what you do. I primarily do software work (programming and sys-admin). While some knowledge of hardware design is required for sys-admin work and the ability to make my own hardware work is required for my own software development I don’t need to be an expert on this. I don’t need to have the latest hardware with new features, the old stuff worked well when I bought it and still works well now. My main machine (which I am using to write this post) is a Thinkpad T41p, it’s a few years old and a little slow by today’s standards but for everything that really matters to me it performs flawlessly. If your job really requires you to have experience with all the latest hardware then you probably work in a computer store and get access to it for free!
  • When you have a problem think about whether new gear is the correct solution. There are a couple of areas in which performance on my Thinkpad is lower than I desire, but they are due to flaws in software that I am using. As I am primarily a programmer and the software in question is free it’s better for me (and the world) if I spend my time fixing the software rather than buying new hardware.
  • Buy decent (not hugely expensive) gear so that you don’t need to continually buy new stuff. EG if a machine is going to store a moderate amount of data then make sure it has space for multiple hard drives so you can easily add new drives.
  • Don’t buy the biggest and baddest machine out there. New hardware is developed so quickly that the fastest gear available now will be slow by next-year’s standards. Buy the second-fastest machine and it’ll be a lot cheaper and often more reliable.
  • Determine your REAL requirements that match what you do. As I do software it makes sense for me to have the most reliable hardware possible so I can avoid stuffing around with things that don’t interest me so much (and which I’m not so good at). So I need reliable machines, I will continue buying Thinkpads (I plan to keep my current one until it’s 5 years old and then buy another), I believe that the Thinkpad is the Rolls-Royce of laptops (see the Lenovo Blogs site for some interesting technical information [2]) and that continuing to use such hardware will keep me effectively using my time on software development rather than fooling with hardware. For desktop machines I have recently wasted unreasonable amounts of time due to memory errors which inspired me to write a post about what a company like Dell could do to address what I consider the real needs of myself and other small business owners [3] (note that Dell is actually producing more suitable hardware in this regard than most companies – they just don’t market it as such).
  • Keep in mind the fact that most things you want to do don’t require special hardware. In fact for most tasks related to computers people were doing similar things 10 years ago with much less hardware. If you believe that it’s just the lack of hardware that prevents you from doing great work then your problem is self-confidence not hardware availability.

It’s interesting that a sports-shoe company has a slogan “Just Do It” while trying to convince people that having special shoes is required for sporting success. Most professional athletes started training with minimal equipment. Get some basic gear and Just Do It!.

References:

  1. http://www.harmony-central.com/Guitar/Articles/Avoiding_GAS/
  2. http://www.lenovoblogs.com/insidethebox – feed: http://feeds.feedburner.com/lenovoblogs/insidethebox
  3. http://etbe.coker.com.au/2007/08/25/designing-computers-for-small-business/

Duplicating a Xen DomU

A fairly common request is to be able to duplicate a Xen instance. For example you might have a DomU for the purpose of running WordPress and want another DomU to run MediaWiki. The difference in configuration between two DomU’s for running web based services that are written in PHP and talking to a MySQL back-end is quite small, so copying the configuration is easier than a clean install.

It is a commonly held opinion that a clean install should be done every time and that Kickstart on Red Hat, FAI on Debian and comparable technologies on other distributions can be used for a quick automatic install. I have not yet got FAI working correctly or got Kickstart working on Xen (it’s on my todo list – I’ll blog about it when it’s done).

Regardless of whether it’s a good idea to copy a Xen DomU, there are often situations where clients demand it or when it’s impractically difficult to do a fresh install.

I believe that the most sensible way to store block devices with Xen is to use LVM. It is a requirement for a Xen system that you can easily create new block devices while the machine is running and that the size of block devices can be changed with minimal effort. This rules out using Linux partitions and makes it unreasonably difficult to use LUNs on a fiber-channel SAN or partitions on a hardware RAID. LVM allows creating new block devices and changing the size of block devices with minimal effort. Another option would be to use files on a regular filesystem to store the filesystem data for Xen DomU’s, if choosing this option I recommend using XFS [1] filesystem (which delivers good performance with large filesystems and large files).

If you use XFS to store the block devices for the DomU that you want to copy then you will need to halt the DomU for the duration of the copy as there is no other way of getting an atomic copy of the filesystem while it’s in use. The way of doing this would be to run the command “xm console foo ; cp /mnt/whatever/foo-root /mnt/whatever/bar-root ; xm create -c foo” where “foo” is the name of the DomU and “/mnt/whatever/foo-root” is the file that is used to store the root device for the DomU (note that multiple cp commands would be needed if there are multiple block devices). The reason for having the two xm commands on the one line is that you initially login to the DomU from the console and type halt and then the xm command will terminate when the DomU is destroyed. This means that there is no delay from the time the domain is destroyed to the time that the copy starts.

If you use LVM to store the block device then things are a little easier (and you get no down-time). You simply run the command “lvcreate -s -L 300m -n foo-snap /dev/V0/foo-root” to create a snapshot with the device name /dev/V0/foo-snap which contains a snapshot the of the LV (Logical Volume) /dev/V0/foo-root. The “-L 300m” option means to use 300Meg of storage space for the snapshot – if the writes to /dev/V0/foo-root exceed 300Meg of data then your snapshot breaks. There is no harm in setting the allocated space for the snapshot to be the same as the size of the volume that you are going to copy – it merely means that more disk space is reserved and unavailable for other LVM operations. Note that V0 needs to be replaced by the name of the LVM VG (Volume Group) Once you have created the snapshot you can create a new LV with the command “lvcreate -n new-root -L X /dev/V0” where X is the size of the device (must be at least as big as the device you are copying) and then copy the data across with a command similar to “dd if=/dev/V0/foo-snap of=/dev/V0/new-root bs=1024k“. After the copy is finished you must remove the snapshot with the command “lvremove /dev/V0/foo-snap” (please be very careful when running this command – you really don’t want to remove an LV that has important data). Note that in normal operation lvremove will always give a prompt “Do you really want to remove active logical volume“. If you made the new device bigger then you must perform the operations that are appropriate for your filesystem to extend it’s size to use the new space.

There is no need to copy a swap device, it’s easier to just create a new device and run mkswap on it.

After copying the data you will need to create the new Xen config (by copying /etc/xen/foo to the new name). Make sure that you edit the Xen config file to use the correct block devices and if you are specifying the MAC address [2] by a “vif” line in the config file make sure that you change them to unique addresses for your LAN segment (reference [2] has information on how to select addresses).

Now you must mount the filesystem temporarily to change the IP address (you really don’t want two DomU’s with the same IP address). If your Dom0 has untrusted users or services that are accessed by untrusted users (IE any Internet facing service) then you want to mount the filesystem in question with the options nosuid and nodev so that if the DomU has been cracked it won’t allow cracking of the Dom0. After changing the configuration files to change the IP address(es) of the DomU you can then umount the filesystem and start it with the xm create command.

If instead of creating the clone DomU on the same Dom0 you want to put it on a different system you can copy the block devices to files on a regular filesystem on removable media (EG an IDE disk with USB attachment). When copying the block devices you also need to copy the Xen configuration and edit it to reflect the new paths to block devices for the data once it’s copied to the new server, but you won’t necessarily need to change the MAC address if you are copying it to a different LAN segment.

References:

  1. http://en.wikipedia.org/wiki/XFS
  2. http://en.wikipedia.org/wiki/MAC_address
7

Citing References in Blog Posts

A significant problem with the old-fashioned media is that as a general rule they don’t cite references for anything. Some of the better TV documentaries and non-fiction books cite references, but this is the exception not the norm. Often documentaries only cite references in DVD extras which are good for the people who like the documentary enough to buy it but not for people who want to rebut it (few people will pay for a resource if they doubt the truth and accuracy of it’s claims).

I can understand newspapers not wanting to publish much in the way of background information in the paper version as every extra line of text in an article is a line of advertising that they can’t sell. So they have financial pressure to produce less content, and the number of people like me who want to check the facts and figures used in articles is probably a small portion of the readership. Another issue with newspapers is that they are often considered as primary authoritative sources (by themselves and by the readers). It is often the case that journalists will interview people who have first-hand knowledge of an issue and the resulting article will be authoritative and a primary source in which case all they need to do is to note that they interviewed the subject. However the majority of articles published will be sourced from elsewhere (news agencies [ http://en.wikipedia.org/wiki/News_agency ] such as Reuters are commonly used). Also articles will often be written based on press releases – it is very interesting to read press releases and see how little work is done by some media outlets to convert them to articles, through a well written press release a corporation or interest group can almost write it’s own articles for publication in the old media.

One way of partially addressing the problem of citing references in old media would be to create a web site of references, then every article could have a URL that is a permanent link to the references and calculations to support the claims and numbers used. Such a URL could be produced by any blogging software, and a blog would be an ideal way of doing this.

For bloggers however it’s much easier to cite references and readers have much higher expectations of links to other sites to support claims and of mathematical calculations shown to indicate how numbers are determined. But there is still room for improvement. Here are some of the most common mistakes that I see in posts by people who are trying to do the right thing:

  1. Indirect links. When you refer to a site you want to refer to it directly. In email (which is generally considered a transient medium) a service such as TinyURL [ www.TinyURL.com ] can be used to create short URLs to refer to pages that have long URLs. This is really good for email as there are occasions when people will want to write the address down and type it in to another computer. For blogging you should assume that your reader has access to browse the web (which is the case most of the time). Another possibility is to have the textual description of a link include a reference to the TinyURL service but to have the HREF refer to the real address. Any service on the net may potentially go away at some future time. Any service on the net may have transient outages, and any reader of your blog may have routing problems that make parts of the net unavailable to them. If accessing a reference requires using TinyURL (or a similar service) as well as the target site then there are two potential things that might break and prevent your readers from accessing it.
    One situation where indirect links are acceptable is for the printed version. So you could have a link in the HTML code for readers to click on to get to the reference page directly and a TinuURL link for people who have a printed version and need to type it in.
    Also when linking to a blog it’s worth considering the fact that a track-back won’t work via TinyURL and track-backs may help you get more readers…
  2. Links that expire. For example never say “there’s a good article on the front page of X” (where X is a blog or news site). Instead say “here’s a link to a good article which happens to be on the front page now” so that someone who reads your post in a couple of years time can see the article that you reference.
    Another problem is links to transient data. For example if you want to comment on the features of a 2007 model car you should try to avoid linking to the car manufacturer page, next year they will release a new car and delete the old data from their site.
    A potential problem related to this is the Google cache pages which translate PDF to HTML and high-light relevant terms and can make it much easier to extract certain information from web pages. It can provide value to readers to use such links but AFAIK there is no guarantee that they will remain forever. I suggest that if you use them you should also provide the authoritative link so that if the Google link breaks at some future time then the reader will still be able to access the data.
  3. Not giving the URLs of links in human readable form. Print-outs of blog pages will lose links and blog reading by email will also generally lose links (although it would be possible to preserve them). This counts for a small part of your readership but there’s no reason not to support their needs by also including links as text (either in the body or at the end of the post). I suggest including the URL in brackets, the most important thing is that no non-URL text touch the ends of the URL (don’t have it in quotes and have the brackets spaced from it). Email clients can generally launch a web browser if the URL is clear. Note that prior to writing this post I have done badly in this regard, while thinking about the best advice for others I realised that my own blogging needed some improvement.
    I am not certain that the practice I am testing in this post of citing URLs inline will work. Let me know what you think via comments, I may change to numbering the citations and providing a list of links in the footer.
  4. Non-specific links. For example saying “Russell Coker wrote a good post about the SE Linux” and referring to my main blog URL is not very helpful to your readers as I have written many posts on that topic and plan to write many more (and there is a chance that some of my future posts on that topic may not meet your criteria of being “good”). Saying “here is a link to a good post by Russell Coker, his main blog URL is here” is more useful, it gives both the specific link (indicating which post you were referring to) and the general information (for people who aren’t able to find it themselves, for the case of deleted/renamed posts, and for Google). The ideal form would be “<a href=”http://etbe.coker.com.au/whatever”>here is a link to a good post by Russell Coker [ http://etbe.coker.com.au/whatever ]</A>, his main blog URL is <a href=”http://etbe.coker.com.au/”> [ http://etbe.coker.com.au ]</A>” (note that this is an example of HTML code as a guide for people who are writing their own HTML, people who use so-called WYSIWYG editors will need to do something different).
  5. Links that are likely to expire. As a rule of thumb if a link is not human readable then the chance of it remaining long-term is low. Companies with content management systems are notorious for breaking links.
  6. Referencing data that you can’t find. If you use data sourced from a web site and the site owner takes it down then you may be left with no evidence to support your assertions. If data is likely to be removed then you should keep a private copy off-line (online might be an infringement of copyright) for future reference. It won’t let you publish the original data but will at least let you discuss it with readers.
  7. Referencing non-public data. The Open Access movement [ http://en.wikipedia.org/wiki/Open_access ] aims to make scholarly material free for unrestricted access. If you cite papers that are not open access then you deny your readers the ability to verify your claims and also encourage the companies that deny access to research papers.
    An insidious problem is with web sites such as the New York Times [ www.nytimes.com ] which need a login and store cookies. As I have logged in to their site at some time in the past I get immediate access to all their articles. But if I reference them in a blog post many readers will be forced to register (some readers will object to this). With the NYT this isn’t such a problem as it’s free to register so anyone who is really interested can do so (with a fake name if they wish). But I still have to keep thinking about the readers for such sites.
    I should probably preview my blog posts from a different account without such cookies.
  8. Failing to provide calculations. My current procedure is to include the maths in my post, for example if you have a 32bit data type used to store a number of milliseconds then it can store 2^32/1000 seconds which is 2^32/1000/60/60/24 = 49.7 days, in this example you can determine with little guessing what each of the numbers represent. For more complex calculations an appendix could be used. A common feature of blogs is the ability to have a partial post sent to the RSS feed and the user has the ability to determine where the post gets cut. So you could cut the post before the calculations, the people who want to see them will find it’s only one click away, and the people who are happy to trust you will have a shorter post.
  9. Linking with little reason. Having a random word appear highlighted with an underline in a blog post is often not very helpful for a reader. It sometimes works for Wikipedia links where you expect that most readers will know what the word means but you want to link to a reference for the few who don’t (my link for the word Wikipedia is an example). In the case where most readers are expected to know what you are referring to then citing the link fully (with a description of the link and a human-readable form for an email client) is overkill and reduces the readability of the text.
    The blogging style of “see here and here for examples” does not work via email and does not explain why a reader should visit the sites. If you want to include random links in a post then having a section at the footer of related links would probably be best.
  10. Linking to a URL as received. Many bloggers paste URLs from Google, email, and RSS feeds into their blog posts. This is a bad idea because it might miss redirection to a different site. If a Google search or an email gives you a URL that is about to go away then it might redirect to a different site. In that case citing the new URL instead of the old one is a service to your readers and will decrease the number of dead-links in your blog over the long-term. Also using services such as www.feedburner.com may cause redirects that you want to avoid when citing a blog post, see my previous post about Feedburner [ http://etbe.coker.com.au/2007/08/20/feedburner-item-link-clicks/ ].

Here are some less common problems in citing posts:

  1. Inappropriately citing yourself. Obviously if there is a topic that you frequently blog about then there will be benefit to linking to old posts instead of covering all the background material, and as long as you don’t go overboard there should not be any problems (links to your own blog are assumed to have the same author so there is no need for a disclaimer). If you write authoritative content on a topic that is published elsewhere then you will probably want to blog about it (and your readers will be interested). But you must mention your involvement to avoid giving the impression that you are trying to mislead anyone. This is particularly important if you are part of a group that prepares a document, your name may not end up on the list of authors but you have a duty to your readers to declare this.
    Any document that you helped prepare can not be used by itself as a support of claims that you make in a blog post. You can certainly say “I have previously demonstrated how to solve this problem, see the following reference”. But links with comments such as “here is an example of why X is true” are generally interpreted to be partly to demonstrate the popular support for an idea.
  2. Citing secret data. The argument “if you knew what I know then you would agree with me” usually won’t be accepted well. There are of course various levels of secrecy that are appropriate. For example offering career advice without providing details of how much money you have earned (evidence of one aspect of career success) is acceptable as the readers understand the desire for some degree of financial secrecy (and of course in any game a coach doesn’t need to be a good player). Arguing the case for a war based on secret data (as many bloggers did) is not acceptable (IMHO), neither is arguing the case for the use of a technology without explaining the science or maths behind it.
  3. Not reading the context of a source. For example I was reading the blog of a well regarded expert in an area of computer science, and he linked to another blog to support one of his claims. I read the blog in question (more than just the post he cited) and found some content that could be considered to be racially offensive and much of the material that I read contained claims that were not adequately supported by facts or logic. I find it difficult to believe that the expert in question (for whom I have a great deal of respect) even casually inspected the site in question. In future I will pay less attention to his posts because of this. I expect a blogger to pay more attention to the quality of their links than I do as a reader of their blog.

While writing this post I realised that my own blogging can be improved in this regard. Many of my older posts don’t adequately cite references. If you believe that any of my future posts fail in this regard then please let me know.

6

Carbon Geo-Sequestration

My post about Why Hydrogen Powered Cars Will Never Work has received a record number of comments. Some of them suggested that carbon geo-sequestration (storing carbon-dioxide at high pressure under-ground) is the solution to the climate change problem. The idea is that you can mix natural gas or coal gas with steam at high temperature to give carbon-dioxide and hydrogen. Then the carbon dioxide gets stored under-ground while the hydrogen is used for relatively clean fuel.

Beyond Zero Emissions has produced a media release about the fallacies expressed in the FutureGen document promoting so-called “clean-coal”, the best content is in their PDF document titled FutureGen Conceptual Design Retort. Note that I did some research to support the preparation of the retort, I am not referencing them to support my arguments but as background information.

One overwhealming problem with geo-sequestration for coal based power plants is that it is significantly more expensive than the current coal-fired power plant design. Currently the price difference between coal power and wind power is quite small and there are several technologies that are almost ready for production which will decrease the cost of wind power, it is expected that before so-called “clean coal” becomes viable (they are planning for the first production plants to go live in 2022) the cost of renewable energy will be lower than the current cost of coal power. There is no reasonable possibility of “clean coal” being cheaper than renewable energy.

The underground reservoirs that could be used for storing CO2 currently contain brine, which can contain toxic metals and radioactive substances (according to the Bureau of Land and Water Quality in the US). If toxic and radioactive substances need to be pumped out to make room for CO2 then it’s hardly a clean process!

The US Geological Survey has an interesting page about volcanic gas. Apparently it’s not uncommon for small animals to be killed when CO2 forms pools in low lying areas. If (when?) CO2 escapes from geo-sequestration the same might happen with humans. They also have a page about CO2 killing trees at Mammoth Mountain! Before I read this I never realised that plants could be killed by excessive CO2. Apparently tree roots need oxygen and CO2 in the ground will kill them. The release of 300 tons of CO2 per day killed 100 acres of trees. The FutureGen trial power plant is designed to support sequestration of over 1,000,000 tons of CO2 per year (that is over 2,700 tons per day). If it leaked at 1/9 that rate then damage comparable to Mammoth Mountain would be the result. Note that the FutureGen trial plant will be a fraction of the size of a real coal power station so an escape of significantly less than 1/9 of the CO2 from a real sequestration plant would have such a bad result. It’s interesting to note that tents and basements are documented as CO2 risks, so I guess we have to avoid camping in areas near power plants!

What would happen if a large geo-sequestration project had a sudden failure? IE if the reservoir broke and all the CO2 erupted suddenly? We already have an answer to this question because such things have happened in the past. In 1986 in Cameroon 1.2 cubic kilometers of CO2 gas was released from a volcanic lake, that is 2,400,000 tons (or just over two years of output from the proposed FutureGen plant). It killed over 2000 people. What might happen if 10 years of output from a commercial scale coal power plant was suddenly released into the atmosphere?

As far as I know there has been no research on de-sequestration of CO2. If a reservoir is discovered to be unstable after 20,000,000 tons of CO2 have been stored in it, what will we do?

Geo-sequestration of CO2 makes nuclear power plants seem safe by comparison.