I’ve just filed Debian bug report #534534 about Valgrind/Helgrind reporting “Possible data race during write” [1]. I included a patch that seems to fix that problem (by checking whether a variable is not zero before setting it to zero). But on further testing with Valgrind 3.4.1 (backported from Debian/Unstable) it seems that my patch is not worth using, I expect that Valgrind related patches won’t be accepted into the Lenny version of OpenSSL.
I would appreciate suggestions on how to fix this, the problem is basically having a single static variable that is initialised to the value 1 but set to 0 the first time one of the malloc functions is called. Using a lock for this is not desirable as it will add overhead to every malloc operation. However without the lock it does seem possible to have a race condition if one thread calls CRYPTO_set_mem_functions() and then before that operation is finished a time slice is given to a thread that is allocating memory. So in spite of the overhead I guess that using a lock is the right thing to do.
deb http://www.coker.com.au lenny gcc
For the convenience of anyone who is testing these things on Debian and wants to use the latest valgrind, the above Debian repository has Valgrind 3.4.1 and a build of GCC to fix the problem I mentioned in my previous blog post about Valgrind [2].
if (default_RSA_meth == NULL)
default_RSA_meth=RSA_PKCS1_SSLeay();
I have also filed bug #534656 about another reported race condition in the OpenSSL libraries [3]. Above is the code in question (with some C preprocessor stuff removed). This seems likely to be a problem on an architecture for which assignment of a pointer is not an atomic operation, I don’t know if we even have any architectures that work in such a way.
static void impl_check(void) Â {
    CRYPTO_w_lock(CRYPTO_LOCK_EX_DATA);
    if(!impl)
        impl = &impl_default;
    CRYPTO_w_unlock(CRYPTO_LOCK_EX_DATA);
}
#define IMPL_CHECK if(!impl) impl_check();
A similar issue is my bug report bug #534683 [4] which is due to a similar issue with the above code. If the macro is changed to just call impl_check() then the problem will go away, but at some performance cost.
I filed bug report #534685 about a similar issue with the EX_DATA_CHECK macro [5].
I filed bug report #534687 about some code that has CRYPTO_w_lock(CRYPTO_LOCK_EX_DATA); before it [6], so it seems that the code may be safe and it may be an issue with how Valgrind recognises problems (maybe a Valgrind bug or an issue with how Valgrind interprets what the OpenSSL code is doing). Valgrind 3.3.1 reported many more issues that were similar to this, so it appears that version 3.4.1 improved the analysis of this but didn’t do quite enough.
I filed bug report #534706 about the cleanse_ctr global variable that is used as a source of pseudo-randomness for the OPENSSL_cleanse() function without locking [7]. It seems that they have the idea that memset() is not adequate for clearing memory. Does anyone know of a good research paper about recovering the contents of memory after memset()? I doubt that we need such things.
I filed bug report #534699 about what appears to be a potential race condition in int_new_ex_data() [8]. The def_get_class() function obtains a lock before returning a pointer to a member of a hash table. It seems possible for an item to be deleted from the hash table (and it’s memory freed) after def_get_class() has returned the pointed but before int_new_ex_data() accesses the memory in question.
I filed bug report #534889 about int_free_ex_data() and int_new_ex_data() which call def_get_class() before obtaining a lock and then use the data returned from that function in a locked area[9] (it seems that obtaining the lock earlier would solve this).
I filed bug report #534892 about another piece of code which would have a race condition if pointer assignment isn’t atomic, this time in err_fns_check() [10]. In my first pass I didn’t bother filing bug reports about most of the issues helgrind raised with the error handling code (there were so many that I just hoped that there was some subtle locking involved that eluded helgrind and my brief scan of the source). But a new entry in my core file collection suggests that this may be a problem area for my code.
I think that it is fairly important to get security related libraries to be clean for use with valgrind and other debugging tools – if only to allow better debugging of the code that calls them. I would appreciate any assistance that people can offer in terms of fixing these problems. I know that there are security risks in terms of changing code in such important libraries, but there are also risks in leaving potential race conditions in such code.
As an aside, I’ve filed a wishlist bug report #534695 requesting that valgrind would have a feature to automatically add entries to the suppressions file [11]. As a function that is considered to be unsafe can be called from different contexts, and code that is considered unsafe can be in a macro that is called from multiple functions there can be many different suppressions needed. Pasting them all into the suppressions file is tedious.
- [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=534534
- [2] http://etbe.coker.com.au/2009/06/22/valgrindhelgrind-and-stl-string/
- [3] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=534656
- [4] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=534683
- [5] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=534685
- [6] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=534687
- [7] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=534706
- [8] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=534699
- [9] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=534889
- [10] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=534892
- [11] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=534695