Linux, politics, and other interesting things
I have previously written about an error that valgrind reported in the STL when some string operations were performed by the DKIM library . This turned out to be a bug, Jonathan Wakely filed GCC bug report #40518  about it, Jonathan is one of many very skillful people who commented on that post.
deb http://www.coker.com.au lenny gcc
I’m still not sure whether that bug could actually harm my program, Nathan Myers strongly suggested that it would not impact the correct functionality of the program but mentioned a possible performance issue (which will hurt me as the target platform is 8 or 12 core systems). Jaymz Julian seems to believe that the STL code in question can lead to incorrect operation and suggested stlport as an alternative. As I’m not taking any chances I built GCC with a patch from Jonathan’s bug report for my development machines and then built libdkim with that GCC. I created the above APT repository for my patched GCC packages. I also included version 3.4.1 of Valgrind (back-ported from Debian/Unstable) in that repository.
Nathan Myers also wrote: “Any program that calls strtok() even once may be flagged as buggy regardless of any thread safety issues. Use of strtok() (or strtok_r()) is a marker not unlike gets() of ill thought out coding.” I agree, I wrote a program to find such code and have eliminated all such code where it is called from my program .
I think it’s unfortunate that I have to rebuild all of GCC for a simple STL patch. My blog post about the issue of the size and time required to rebuild those packages  received some interesting comments, probably the most immediately useful one was to use --disable-bootstrap to get a faster GCC build, that was from Jonathan Wakely. Joe Buck noted that the source is available in smaller packages upstream, this is interesting, but unless the Debian developers package it in the same way I will have to work with the large Debian source packages.
I have filed many bug reports against the OpenSSL packages in Debian based on the errors reported by Valgrind . I didn’t report all the issues related to error handling as there were too many. Now my program is often crashing when DomainKeys code is calling those error functions, so one of the many Valgrind/Helgrind issues I didn’t report may be the cause of my problems. But I can’t report too many bugs at once, I need to give people time to work on the current bug list first.
Another problem I have is that sometimes the libdkim code will trigger a libc assertion on malloc() or free() if DomainKeys code has been previously called. So it seems that the DomainKeys code (or maybe the OpenSSL code it calls) is corrupting the heap.
So I have given up on the idea of getting DomainKeys code working in a threaded environment. Whenever I need to validate a DomainKeys message my program will now fork a child process to do that. If it corrupts the heap while doing so it’s no big deal as the child process calls exit(0) after it has returned the result over a pipe. This causes a performance loss, but it appears that it’s less than 3 times slower which isn’t too bad. From a programming perspective this was fairly easy to implement because a thread of the main program prepares all the data and then the child process can operate on it – it would be a lot harder to implement such things on an OS which doesn’t have fork().
DomainKeys has been obsoleted by DKIM for some time, so all new deployments of signed email should be based on DKIM and systems that currently use DomainKeys should be migrating soon. So the performance loss on what is essentially a legacy feature shouldn’t impact the utility of my program.
I am considering uploading my libdomainkeys package to Debian. I’m not sure how useful it would be as DomainKeys is hopefully going away. But as I’ve done a lot of work on it already I’m happy to share if people are interested.
Thanks again for all the people who wrote great comments on my posts.