Timing Processes

One thing that happens periodically is that I start a process from an interactive shell, discover that it takes longer than expected, and then want to know how long it took. Basically it’s a retrospective need to have run “time whatever” that I discover after the process has been running for long enough that I don’t want to restart it. My current procedure in such situations is to run ps from another session to discover when it started and then type date to display when it ends.

A quick test with strace showed that bash uses the wait4() system call to determine when a process ends, but passes NULL as the last parameter. If it passed the pointer to a struct rusage then it would have the necessary data.

I think it would be a really good feature for a shell to allow you to type something like “echo $TIME_LAST_CMD” to see how long the last command took. For the common case where you aren’t interested in that data it would only involve an extra parameter to the wait4() system call, a small amount of memory allocated for it, and to store yet another environment variable in it’s list.

A quick Google search didn’t show any way of filing wishlist bugs against Bash and I don’t think that this is a real bug as such so I haven’t filed a bug report. If anyone reads my blog and has some contact with the Bash people then please pass this idea along if you think it’s worthy of being included.

Flash, Apple, and Linux

Steve Jobs has published an interesting article about Flash [1]. He criticises Flash for being proprietary, this seems a little hypocritical coming from Apple (who’s the only competitor for Microsoft in terms of being the most proprietary computer company) but is in fact correct. Steve advocates HTML5 which is a better technical solution to a lot of the things that Flash does. He claims that Apple users aren’t missing out on much, but I think that sites such as Physics Games [2] demonstrate the benefits of Flash.

I think that Apple’s attack on Flash is generally a good thing. HTML5 web sites will work everywhere which will be a good incentive for web designers to fix their sites. I also think that we want to deprecate it, but as it’s unfortunately popular it’s useful to have tools such as GNASH to use Flash based web sites with free software. Microsoft has belatedly tried to compete with flash, but it’s Silverlight system and the free but patent encumbered Linux equivalent Moonlight have very little content to play and will probably disappear soon. As an aside the relentless determination of GNOME people to force the MONO project (including Moonlight) on it’s users convinced me to remove GNOME from all systems that I run.

OS News has a good analysis of the MPEG-LA patents [3] which are designed to prevent anyone making any commercial use of H.264 – which includes putting such videos on sites that contain Google advertising! These patent terms are so horrible that they want to control video streams that were ever encoded with them, so you can’t even transcode a H.264 stream to an open format without potentially having the scum at MPEG-LA going after you. This is worth noting when examining Apple’s actions, they support MPEG patents and therefore seem happy to do anything that reduces the freedom of their customers. Apple’s 1984 commercial has been proven to be a lie, it’s Apple that wants to control our freedom.

Charles Stross makes some good points about the issues related to Apple and Flash [4]. He believes that it’s all part of an Apple push to cloud computing and that Apple wants to own all our data at the back-end while providing a relatively reliable front-end (IE without all the anti-virus nonsense that is needed on the MS-Windows platform. Cloud computing is a good thing and I can’t wait for the Linux support for it to improve, I support a number of relatives who run Linux and it would be a lot easier for me if they could have the primary storage for everything be on the cloud so that I can do central backups of user data and they can use their own data while visiting each other. I think that a network filesystem that is similar in concept to offline-IMAP would be a really good thing, I know that there are some filesystems such as AFS and CODA that are designed for wide area network use with client-side caching but as far as I am aware they aren’t designed for the type of operation that offline/caching IMAP supports.

Matt Brubeck has given a good status report of the work porting Firefox to Android [5]. He notes that the next version of Fennec (mobile Firefox) will have Electrolysis – the Firefox one process per tab feature that was first implemented in Google Chrome [6]. I think that the development of Fennec and the one process per tab feature are both great developments. Matt also says “One of my personal goals is to make Firefox compatible with more mobile sites, and to give web developers the tools and information they need to make their sites work great in mobile Firefox. I’ll write much more about this in future articles“, that sounds great, I look forward to the results of his coding and to reading his blog posts about it!

Marshmallow Challenge for Linux Programmers

Tom Wujec gave an interesting TED talk about training people in team-work and engineering through building the tallest possible structures from 20 pieces of spaghetti, 1 yard of string, and 1 yard of sticky-tape with a time limit of 18 minutes [1]. The project is completed by groups of four people – which is probably about the maximum number of hands that you could have on such a small structure at one time. They have a web site MarshmallowChallenge.com/ which gives some suggestions for conducting a challenge.

One interesting point made in the talk is that kindergarten students tend to do better than most adults.

I think it would be good to have such challenges at events such as Linux conferences. The type of people who attend such conferences tend to enjoy such challenges, and it may lead to some lessons in team-work that can help the software development process. Also we can discover whether Linux programmers are better than the typical kindergarten students. ;)

Discovering OS Bugs and Using Snapshots

I’m running Debian/Unstable on an EeePC 701, I’ve got an SD card for /home etc but the root filesystem is on the internal 4G flash storage which doesn’t have much spare space (I’ve got a full software development environment, GCC, debuggers, etc as well as running KDE4). On some of my systems I’ve started the practice of having two root filesystem installs, modern disks are big enough that usually it’s difficult to use all the space, and even if you do use most of the space the use of a second root filesystem only takes a fraction of a percent of the available space.

Today I discovered a problem with my EeePC, I had upgraded to the latest Unstable packages a few days ago and now when I run X programs the screen flickers really badly every time it’s updated. Pressing a key in a terminal window makes the screen shake, watching a video with mplayer makes it shake constantly to such a degree that it’s not usable. If that problem occurred on a system with a second root filesystem I could upgrade the other a few packages at a time to try and discover the root cause. But without the space for a second root filesystem this isn’t an option.

I hope that Btrfs [1] becomes ready for serious use soon, it seems that the btrfs snapshot facility might make it possible for me to preserve the old version in a bootable form before upgrading my EeePC (although even then disk space would be tight).

So I guess I now need to test different versions of the X related packages in a chroot environment to track this bug down. Sigh.

Ext4 and Debian/Lenny

I want to use the Ext4 filesystem on Xen DomUs. The reason for this is that the problem of fsck times on ext4 (as described in my previous post about Ext4 [1]) is compounded if you have multiple DomUs running fsck at the same time.

One issue that makes this difficult is the fact that it is very important to be able to mount a DomU filesystem in the Dom0 and it is extremely useful to be able to fsck a DomU filesystem from a Dom0 (for example when you want to resize the root filesystem of the DomU).

I have Dom0 systems running CentOS5, RHEL5, and Debian/Lenny, and I have DomU systems running CentOS5, RHEL4, Debian/Lenny, and Debian/Unstable. So to get Ext4 support on all my Xen servers I need it for Debian/Lenny and RHEL4 (Debian/Unstable has full support for Ext4 and RHEL5 and CentOS5 have been updated to support it [2]).

The Debian kernel team apparently don’t plan to add kernel support for Ext4 in Lenny (they generally don’t do such things) and even backports.debian.org doesn’t have a version of e2fsprogs that supports ext4. So getting Lenny going with Ext4 requires a non-default kernel and a back-port of the utilities. In the past I’ve used CentOS and RHEL kernels to run Debian systems and that has worked reasonably well. I wouldn’t recommend doing so for a Dom0 or a non-virtual install, but for a DomU it works reasonably well and it’s not too difficult to recover from problems. So I have decided to upgrade most of my Lenny virtual machines to a CentOS 5 kernel.

When installing a CentOS 5 kernel to replace a Debian/Lenny kernel you have to use “console=tty0” as a kernel parameter instead of “xencons=tty“, you have to use /dev/xvc0 as the name of the terminal for running a getty (IE xvc0 is a parameter to getty) and you have to edit /etc/rc.local (or some other init script) to run “killall -9 nash-hotplug” as a nash process from the Red Hat initrd goes into an infinite loop. Of course upgrading a CentOS kernel on a Debian system is a little more inconvenient (I upgrade a CentOS DomU and then copy the kernel modules to the Debian DomUs and the vmlinuz and initrd to the Dom0).

The inconvenience of this can be an issue in an environment where multiple people are involved in running the systems, if a sysadmin who lacks skills or confidence takes over they may be afraid to upgrade the kernel to solve security issues. Also “apt-get dist-upgrade” won’t show that a CentOS kernel can be updated, so a little more management effort is required in tracking which machines need to be upgraded.

deb http://www.coker.com.au lenny misc

To backport the e2fsprogs package I first needed to backport util-linux, debhelper, libtool, xz-utils, base-files, and dpkg. This is the most significant and invasive back-port I’ve done. The above apt repository has all the packages for AMD64 and i386 architectures.

For a Debian system after the right kernel is installed and e2fsprogs (and it’s dependencies) are upgraded the command “tune2fs -O flex_bg,uninit_bg /dev/xvda” can be used to enable the ext4 filesystem. At the next reboot the system will prompt for the root password and allow you to manually run “e2fsck -y /dev/xvda” to do the real work of transitioning the filesystem (unlike Red Hat based distributions which do this automatically).

So the state of my Debian systems running this is that the DomUs run the CentOS kernel and my backported utilities while the Dom0 just runs the backported utilities with the Lenny kernel. Thus the Debian Dom0 can’t mount filesystems from the DomUs – which makes things very difficult when there is a problem that needs to be fixed in a DomU, I have to either mount the filesystem from another DomU or boot with “init=/bin/bash“.

Ext4 and RHEL5/CentOS5

I have just noticed that Red Hat added Ext4 support to RHEL-5 in kernel 2.6.18-110.el5. They also added a new package named e4fsprogs (a break from the e2fsprogs name that has been used for so long). Hopefully they will use a single package for utilities for Ext2/3/4 filesystems in RHEL-6 and not continue this package split. Using commands such as e4fsck and tune4fs is a minor inconvenience.

Converting a RHEL 5 or CentOS 5 system to Ext4 merely requires running the command “tune4fs -O flex_bg,uninit_bg /dev/WHATEVER” to enable Ext4 on the devices, editing /etc/fstab to change the filesystem type to ext4, running a command such as “mkinitrd -f /boot/initrd-2.6.18-164.9.1.el5xen.img 2.6.18-164.9.1.el5xen” to generate a new initrd with Ext4 support (which must be done after editing /etc/fstab), and then rebooting.

When the system is booted it will run fsck on the filesystems automatically – but not display progress reports which is rather disconcerting. The system will display “/ contains a file system with errors, check forced.” and apparently hang for a large amount of time. This is however slightly better than the situation on Debian/Unstable where upgrading to Ext4 results in an fsck error on boot which forces you to login in single user mode to run fsck [1] – which would be unpleasant if you don’t have convenient console access. Hopefully this will be fixed before Squeeze is released.

I now have a couple of my CentOS 5 DomUs running with Ext4, it seems to work well.

The Transition to Ext4

I’ve been investigating the Ext4 filesystem [1].

The main factor that is driving me to Ext4 at the moment is fsck times. I have some systems running Ext3 on large filesystems which I need to extend. In most cases Ext3 filesystems have large numbers of Inodes free because the relationship between the number of Inodes and the filesystem size is set when it is created, enlarging the filesystem increases the number of Inodes and apart from backup/format/restore there is no way of changing this. Some of the filesystems I manage can’t be converted because the backup/restore time would involve an unreasonable amount of downtime.

Page 11 of the OLS paper by Avantika Mathur et al [2] has a graph of the relationship between the number of Inodes and fsck time.

Ext4 also has a number of other features to improve performance, including changes to journaling and block allocation.

Now my most important systems are all virtualised. I am using Debian/Lenny and RHEL5 for the Dom0s. Red Hat might back-port Ext4 to the RHEL5 kernel, but there will probably never be a supported kernel for Debian/Lenny with Ext4 and Xen Dom0 support (there may never be a kernel for any Debian release with such support).

So this means that in a few months time I will be running some DomUs which have filesystems that can’t be mounted in the Dom0. This isn’t a problem when everything works well. But when things go wrong it’s really convenient to be able to amount a filesystem in the Dom0 to fix things, this option will disappear for some of my systems, so if virtual machine A has a problem then I will have to mount it’s filesystems with virtual machine B to fix it. Of course this is a strong incentive to use multiple block devices for the virtual machine so that a small root filesystem can be run with Ext3 and the rest can be Ext4.

At the moment only Debian/Unstable and Fedora have support for Ext4 so this isn’t a real issue. But Debian/Squeeze will release with Ext4 support and I expect that RHEL6 will also have it. When those releases happen I will be upgrading my virtual machines and will have these support issues.

It’s a pity that Red Hat never supported XFS, I could have solved some of these problems years ago if XFS was available.

Now for non-virtual machines one factor to consider is that the legacy version of GRUB doesn’t support Ext4, I discovered this after I used tune2fs to convert all filesystems on my EeePC to Ext4. I think I could have undone that tune2fs option but instead decided to upgrade to the new version of GRUB and copy the kernel and initramfs to a USB device in case it didn’t boot. It turns out that the new version of GRUB seems to work well for booting from Ext4.

One thing that is not explicitly mentioned in the howto is that the fsck pass needed to convert to Ext4 will not be done automatically by most distributions. So when I converted my EeePC I had to use sulogin to manually fsck the filesystems. This isn’t a problem with a laptop, but could be a problem with a co-located server system.

For the long term BTRFS may be a better option, I plan to test it on /home on my EeePC. But I will give Ext4 some more testing first. In any case the Ext3 filesystems on big servers are not going to go away in a hurry.

Finding Thread-unsafe Code

One problem that I have had on a number of occasions when developing Unix software is libraries that use non-reentrant code which are called from threaded programs. For example if a function such as strtok() is used which is implemented with a static variable to allow subsequent calls to operate on the same string then calling it from a threaded program may result in a SEGV (if for example thread A calls strtok() and then frees the memory before thread B makes a second call to strtok(). Another problem is that a multithreaded program may have multiple threads performing operations on data of different sensitivity levels, for example a threaded milter may operate on email destined for different users at the same time. In that case use of a library call which is not thread safe may result in data being sent to the wrong destination.

One potential solution is to use a non-threaded programming model (IE a state machine or using multiple processes). State machines don’t work with libraries based on a callback model (EG libmilter), can’t take advantage of the CPU power available in a system with multiple CPU cores, and require asynchronous implementations of DNS name resolution. Multiple processes will often give less performance and are badly received by users who don’t want to see hundreds of processes in ps output.

So the question is how to discover whether a library that is used by your program has code that is not reentrant. Obviously a library could implement it’s own functions that use static variables – I don’t have a solution to this. But a more common problem is a library that uses strtok() and other libc functions that aren’t reentrant – simply because they are more convenient. Trying to examine the program with nm and similar tools doesn’t seem viable as libraries tend to depend on other libraries so it’s not uncommon to have 20 shared objects being linked in at run-time. Also there is the potential problem of code that isn’t called, if library function foo() happens to call strtok() but I only call function bar() from that library then even though it resolves the symbol strtok at run-time it shouldn’t be a problem for me.

So the obvious step is to use a LD_PRELOAD hack to override all the undesirable functions with code that will assert() or otherwise notify the developer. Bruce Chapman of Sun did a good job of this in 2002 for Solaris [1]. His code is very feature complete but has a limited list of unsafe functions.

Instead of using his code I wrote a minimal implementation of the same concept which searches the section 3 man pages installed on the system for functions which have a _r variant. In addition to that list of functions I added some functions from Bruce’s list which did not have a _r variant. That way I got a list of 72 functions compared to the 40 that Bruce uses. Of course with my method the number of functions that are intercepted will depend on the configuration of the system used to build the code – but that is OK, if the man pages are complete then that will cover all functions that can be called from programs that you write.

Now there is one significant disadvantage to my code. That is the case where unsafe functions are called before child threads are created. Such code will be aborted even though in production it won’t cause any problems. One thing I am idly considering is writing code to parse the man pages for the various functions so it can use the correct parameters for proxying the library calls with dlsym(RTLD_NEXT, function_name). The other option would be to hand code each of the 72 functions (and use more hand coding for each new library function I wanted to add).

To run my code you simply compile the shared object and then run “LD_PRELOAD=./thread.so ./program_to_test” and the program will abort and generate a core dump if the undesirable functions are called.

Here’s the source to the main program:

#!/bin/bash
cat > thread.c << END
#undef NDEBUG
#include <assert.h>
END
OTHERS="getservbyname getservbyport getprotobyname getnetbyname getnetbyaddr getrpcbyname getrpcbynumber getrpcent ctermid tempnam gcvt getservent"
for n in $OTHERS $(ls -1 /usr/share/man/man3/*_r.*|sed -e "s/^.*\///" -e "s/_r\..*$//"|grep -v ^lgamma|sort -u) ; do
  cat >> thread.c << END
void $n()
{
  assert(0);
}
END
done

Here is the Makefile, probably the tabs will be munged by my blog but I’m sure you know where they go:

all: thread.so

thread.c: gen.sh Makefile
./gen.sh

thread.so: thread.c
gcc -shared -o thread.so -fPIC thread.c

clean:
rm thread.so thread.c

Update:
Simon Josefsson wrote an interesting article in response to this [2].

Per-process Namespaces – pam-namespace

Mike writes about his work in using namespaces on Linux [1]. In 2006 I presented a paper titled “Polyinstantiation of directories in an SE Linux system” about this at the SAGE-AU conference [2].

Newer versions of the code in question has been included in Debian/Lenny. So if you want to use namespaces for a login session on a Lenny system you can do the following:
mkdir /tmp-inst
chmod 0 /tmp-inst
echo “/tmp /tmp-inst/ user root” >> /etc/security/namespace.conf
echo “session required pam_namespace.so” >> /etc/pam.d/common-session

Then every user will have their own unique /tmp and be unable to mess with other users.

If you want to use the shared-subtrees facility to have mount commands which don’t affect /tmp be propagated to other sessions then you need to have the following commands run at boot (maybe from /etc/rc.local):
mount –make-shared /
mount –bind /tmp /tmp
mount –make-private /tmp

The functionality in pam_namespace.so to use the SE Linux security context to instantiate the directory seems broken in Lenny. I’ll write a patch for this shortly.

While my paper is not particularly useful as documentation of pam_namespace.so (things changed after I wrote it), it does cover the threats that you face in terms of hostile use of /tmp and how namespaces may be used to solve them.

Things you can do for your LUG

A Linux Users Group like most volunteer organisations will often have a small portion of the membership making most of the contributions. I believe that every LUG has many people who would like to contribute but don’t know how, here are some suggestions for what you can do.

Firstly offer talks. Many people seem to believe that giving a talk for a LUG requires expert knowledge. While it is desired to get any experts in the area to share their knowledge, it is definitely not a requirement that you be an expert to give a talk. The only requirement is that you know more than the audience – and a small amount of research can achieve that goal.

One popular talk that is often given is “what’s new in Linux”. This is not a talk that requires deep knowledge, it does require spending some time reading the news (which lots of people do for fun anyway). So if you spend an average of 30 minutes a day every week day reading about new developments in Linux and other new technology, you could spend another minute a day (20 minutes a month) making notes and the result would be a 10 to 15 minute talk that would be well received. A talk about what’s new is one way that a novice can give a presentation that will get the attention of all the experts (who know their own area well but often don’t have time to see the big picture).

There are many aspects of Linux that are subtle, tricky, and widely misunderstood. Often mastering them is a matter that is more related to spending time testing than anything else. An example of this is the chmod command (and all the Unix permissions that are associated with it). I believe that the majority of Linux users don’t understand all the subtleties of Unix permissions (I have even seen an employee of a Linux vendor make an error in this regard while running a formal training session). A newbie who spent a few hours trying the various combinations of chmod etc and spoke about the results could give a talk that would teach something to almost everyone in the audience. I believe that there are many other potential talk topics of this nature.

One thing that is often overlooked when considering how to contribute to LUGs is the possibility of sharing hardware. We all have all the software we need for free but hardware still costs money. If you have some hardware that hasn’t been used for a year then consider whether you will ever use it again, if it’s not likely to be used then offer it to your LUG (either via a mailing list or by just bringing it to a meeting). Also if you see some hardware that is about to be discarded and you think that someone in your LUG will like it then grab it! In a typical year I give away a couple of car-loads of second-hand hardware, most of it was about to be thrown out by a client so I grab it for my local LUG. Taking such hardware reduces disposal costs for my clients, prevents computer gear from poisoning landfill (you’re not supposed to put it in the garbage but most people do), and helps random people who need hardware.

One common use for the hardware I give away is for children. Most people are hesitant to buy hardware specifically for children as it only takes one incident of playing with the switch labeled 240V/110V (or something of a similar nature) to destroy it. Free hardware allows children to get more access to computers at an early age.

Finally one way to contribute is by joining the committee. Many people find it difficult to attend meetings, so attending a regular meeting and a committee meeting every month is difficult. So if you have no problems in attending meetings then please consider contributing in this way.