Archives

Categories

Finding Thread-unsafe Code

One problem that I have had on a number of occasions when developing Unix software is libraries that use non-reentrant code which are called from threaded programs. For example if a function such as strtok() is used which is implemented with a static variable to allow subsequent calls to operate on the same string then calling it from a threaded program may result in a SEGV (if for example thread A calls strtok() and then frees the memory before thread B makes a second call to strtok(). Another problem is that a multithreaded program may have multiple threads performing operations on data of different sensitivity levels, for example a threaded milter may operate on email destined for different users at the same time. In that case use of a library call which is not thread safe may result in data being sent to the wrong destination.

One potential solution is to use a non-threaded programming model (IE a state machine or using multiple processes). State machines don’t work with libraries based on a callback model (EG libmilter), can’t take advantage of the CPU power available in a system with multiple CPU cores, and require asynchronous implementations of DNS name resolution. Multiple processes will often give less performance and are badly received by users who don’t want to see hundreds of processes in ps output.

So the question is how to discover whether a library that is used by your program has code that is not reentrant. Obviously a library could implement it’s own functions that use static variables – I don’t have a solution to this. But a more common problem is a library that uses strtok() and other libc functions that aren’t reentrant – simply because they are more convenient. Trying to examine the program with nm and similar tools doesn’t seem viable as libraries tend to depend on other libraries so it’s not uncommon to have 20 shared objects being linked in at run-time. Also there is the potential problem of code that isn’t called, if library function foo() happens to call strtok() but I only call function bar() from that library then even though it resolves the symbol strtok at run-time it shouldn’t be a problem for me.

So the obvious step is to use a LD_PRELOAD hack to override all the undesirable functions with code that will assert() or otherwise notify the developer. Bruce Chapman of Sun did a good job of this in 2002 for Solaris [1]. His code is very feature complete but has a limited list of unsafe functions.

Instead of using his code I wrote a minimal implementation of the same concept which searches the section 3 man pages installed on the system for functions which have a _r variant. In addition to that list of functions I added some functions from Bruce’s list which did not have a _r variant. That way I got a list of 72 functions compared to the 40 that Bruce uses. Of course with my method the number of functions that are intercepted will depend on the configuration of the system used to build the code – but that is OK, if the man pages are complete then that will cover all functions that can be called from programs that you write.

Now there is one significant disadvantage to my code. That is the case where unsafe functions are called before child threads are created. Such code will be aborted even though in production it won’t cause any problems. One thing I am idly considering is writing code to parse the man pages for the various functions so it can use the correct parameters for proxying the library calls with dlsym(RTLD_NEXT, function_name). The other option would be to hand code each of the 72 functions (and use more hand coding for each new library function I wanted to add).

To run my code you simply compile the shared object and then run “LD_PRELOAD=./thread.so ./program_to_test” and the program will abort and generate a core dump if the undesirable functions are called.

Here’s the source to the main program:

#!/bin/bash
cat > thread.c << END
#undef NDEBUG
#include <assert.h>
END
OTHERS="getservbyname getservbyport getprotobyname getnetbyname getnetbyaddr getrpcbyname getrpcbynumber getrpcent ctermid tempnam gcvt getservent"
for n in $OTHERS $(ls -1 /usr/share/man/man3/*_r.*|sed -e "s/^.*\///" -e "s/_r\..*$//"|grep -v ^lgamma|sort -u) ; do
  cat >> thread.c << END
void $n()
{
  assert(0);
}
END
done

Here is the Makefile, probably the tabs will be munged by my blog but I’m sure you know where they go:

all: thread.so

thread.c: gen.sh Makefile
./gen.sh

thread.so: thread.c
gcc -shared -o thread.so -fPIC thread.c

clean:
rm thread.so thread.c

Update:
Simon Josefsson wrote an interesting article in response to this [2].

7 comments to Finding Thread-unsafe Code

  • If you only want to know if they are used, you could also use the ltrace program. It is similar to strace, but for library calls.

  • etbe

    glandium: Good point. But then you need to grep the output for the functions in question, you have the ptrace overhead, and when it does flag a problem you won’t know the back-trace of it.

  • Nick

    Some related work:

    Helgrind is a thread debugger which finds data races in multithreaded programs
    http://valgrind.org/info/tools.html

    Nick Nethercote has done some additional work on looking for non reentrant functions called by signal handlers:
    http://valgrind.org/downloads/variants.html?njn

  • I’d take issue with the statement that non-threaded programs can’t take advantage of CPUs and are slower than threaded ones. :-)

    Many HPC codes are non-threaded MPI programs, they take full advantage of multiple cores. In fact we’ve seen one particular code available in both a threaded SMP version and a pure MPI version and found that the MPI version makes far more efficient use of an SMP system (scales better) than the SMP version.

    Regarding threading being faster, Tridge has already made a convincing case back in 2004 (mentioned in passing at a LUV meeting way back when it was in the Telstra building) that threads are always slower than processes, asides from when the OS is very broken.

  • Bill Farrow

    On my system I needed to remove the duplicate functions names before compiling the library. Added “sort -u” to the code. Actually you might want merge the $OTHERS into the list before sorting and removing duplicates.

    – for n in $OTHERS $(ls -1 /usr/share/man/man3/*_r.*|sed -e “s/^.*\///” -e “s/_r\..*$//”|grep -v ^lgamma) ; do
    + for n in $OTHERS $(ls -1 /usr/share/man/man3/*_r.*|sed -e “s/^.*\///” -e “s/_r\..*$//”|grep -v ^lgamma | sort -u) ; do

  • etbe

    foo and Nick: Thanks for the suggestion, I’m investigating that now.

    Chris: That’s interesting for HPC, but not of much use for what I’m doing. I have to work with the standard Milter libraries which don’t support such things.

    Bill: Good point, that’s one of the many rough edges in my code. I trimmed the OTHERS list, it’s the entries from Sun’s list that didn’t appear when I searched the man pages – mostly functions with no reentrant versions. Your idea is much better.