Archives

Categories

Valgrind/Helgrind and STL string

I am trying to track down a thread-safety problem in one of my programs. Valgrind when run as “valgrind –tool=helgrind ./thread-test” claims that there is a problem with the following program (the Valgrind errors are at the end of the post). The SGI documents state [1]: “The SGI implementation of STL is thread-safe only in the sense that simultaneous accesses to distinct containers are safe, and simultaneous read accesses to to shared containers are safe. If multiple threads access a single container, and at least one thread may potentially write, then the user is responsible for ensuring mutual exclusion between the threads during the container accesses. “.

My interpretation of the SGI document is that different STL strings can be manipulated independently. When I have two threads running I have two stacks, so strings that are allocated on the stack will be “distinct containers“. So this looks like a bug in the STL or a bug in Valgrind. I am hesitant to file a bug report because I remember the advice that a wise programmer once gave me: “the people who develop compilers and tool chains are much better at coding than you, any time you think there’s a bug in their code there is probably a bug in yours“. I know that my blog is read by some really great programmers. I look forward to someone explaining what is wrong with my code or confirming that I have found a bug in Valgrind or the STL.

#define NUM_THREADS 2
#include <stdlib.h>
#include <pthread.h>
#include <stdio.h>
#include <string>

using namespace std;

void whatever()
{
string str;
str.erase();
}

struct thread_data
{
int thread_id;
};

void *do_work(void *data)
{
struct thread_data *td = (struct thread_data *)data;
printf("%d:stack:%X\n", td->thread_id, &td);
while(1)
  whatever();
}

int main(int argc, char **argv)
{
pthread_t *thread_info = (pthread_t *)calloc(NUM_THREADS, sizeof(thread_info));
pthread_attr_t attr;
if(pthread_attr_init(&attr) || pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE) || pthread_attr_setstacksize(&attr, 32*1024))
  fprintf(stderr, "Can't set thread attributes.\n");

int t;
struct thread_data td[NUM_THREADS];
for(t = 0; t < NUM_THREADS; t++)
{
  printf("created thread %d\n", t);
  td[t].thread_id = t;
  int p = pthread_create(&thread_info[t], &attr, do_work, (void *)&td[t]);
  if(p)
  {
  fprintf(stderr, "Can't create thread %d\n", t);
  exit(1);
  }
}
pthread_attr_destroy(&attr);
void *value_ptr;
for(t = 0; t < NUM_THREADS; t++)
  pthread_join(thread_info[t], &value_ptr);
free(thread_info);

return 0;
}

==30622== Possible data race during write of size 4 at 0x414FED0
==30622==    at 0x40FAD9A: std::string::_Rep::_M_set_sharable() (basic_string.h:201)
==30622==    by 0x40A945E: _ZNSs4_Rep26_M_set_length_and_sharableEj@@GLIBCXX_3.4.5 (basic_string.h:206)
==30622==    by 0x40FD6B6: std::string::_M_mutate(unsigned, unsigned, unsigned) (basic_string.tcc:471)
==30622==    by 0x40FDBEE: std::string::erase(unsigned, unsigned) (basic_string.h:1133)
==30622==    by 0x8048AA8: whatever() (thread-test.cpp:16)
==30622==    by 0x8048B0A: do_work(void*) (thread-test.cpp:29)
==30622==    by 0x402641B: mythread_wrapper (hg_intercepts.c:193)
==30622==    by 0x40464BF: start_thread (in /lib/i686/cmov/libpthread-2.7.so)
==30622==    by 0x42696DD: clone (in /lib/i686/cmov/libc-2.7.so)
==30622==  Old state: shared-readonly by threads #2, #3
==30622==  New state: shared-modified by threads #2, #3
==30622==  Reason:    this thread, #3, holds no consistent locks
==30622==  Location 0x414FED0 has never been protected by any lock
==30622==
==30622== Possible data race during write of size 4 at 0x414FEC8
==30622==    at 0x40A9465: _ZNSs4_Rep26_M_set_length_and_sharableEj@@GLIBCXX_3.4.5 (basic_string.h:207)
==30622==    by 0x40FD6B6: std::string::_M_mutate(unsigned, unsigned, unsigned) (basic_string.tcc:471)
==30622==    by 0x40FDBEE: std::string::erase(unsigned, unsigned) (basic_string.h:1133)
==30622==    by 0x8048AA8: whatever() (thread-test.cpp:16)
==30622==    by 0x8048B0A: do_work(void*) (thread-test.cpp:29)
==30622==    by 0x402641B: mythread_wrapper (hg_intercepts.c:193)
==30622==    by 0x40464BF: start_thread (in /lib/i686/cmov/libpthread-2.7.so)
==30622==    by 0x42696DD: clone (in /lib/i686/cmov/libc-2.7.so)
==30622==  Old state: shared-readonly by threads #2, #3
==30622==  New state: shared-modified by threads #2, #3
==30622==  Reason:    this thread, #3, holds no consistent locks
==30622==  Location 0x414FEC8 has never been protected by any lock
==30622==
==30622== Possible data race during write of size 1 at 0x414FED4
==30622==    at 0x40A9012: std::char_traits<char>::assign(char&, char const&) (char_traits.h:246)
==30622==    by 0x40A9488: _ZNSs4_Rep26_M_set_length_and_sharableEj@@GLIBCXX_3.4.5 (basic_string.h:208)
==30622==    by 0x40FD6B6: std::string::_M_mutate(unsigned, unsigned, unsigned) (basic_string.tcc:471)
==30622==    by 0x40FDBEE: std::string::erase(unsigned, unsigned) (basic_string.h:1133)
==30622==    by 0x8048AA8: whatever() (thread-test.cpp:16)
==30622==    by 0x8048B0A: do_work(void*) (thread-test.cpp:29)
==30622==    by 0x402641B: mythread_wrapper (hg_intercepts.c:193)
==30622==    by 0x40464BF: start_thread (in /lib/i686/cmov/libpthread-2.7.so)
==30622==    by 0x42696DD: clone (in /lib/i686/cmov/libc-2.7.so)
==30622==  Old state: owned exclusively by thread #2
==30622==  New state: shared-modified by threads #2, #3
==30622==  Reason:    this thread, #3, holds no locks at all

20 comments to Valgrind/Helgrind and STL string

  • phihar

    AFAIK, std::basic_string uses COW and reference counting. Usually, this might make even distinct strings share some internal data, leading to thread-safety issues. Unfortunately, I don’t know of any compiler switch/#define to make std::basic_string POSIXly thread-safe. Hopefully someone else knows a simple solution to this, which does not require explicit locking…

    Other containers like std::vector should be fine (when used/written exclusively).

  • Nathan Myers

    Very incidentally, the string implementation in GNU libstdc++ did not originate with the SGI STL. However, it is meant to follow the same rules. It does use shared storage, but it’s easy to get in trouble when passing them (and iterators to them) around by reference, because sometimes you might be sharing a string (not the storage, an actually user-visible string) between threads without noticing.

    Strictly speaking, Stepanov’s and Lee’s STL was adopted, with changes, to comprise chapters (“clauses”) 23, 24, and 25 of the Standard (ISO 14882), and SGI’s implementation based on that was then altered to match the standard, and then copied into GNU libstdc++ with further alterations.

    The above is posted without having looked at your code example yet.

  • Nathan Myers

    Oh, and string is described in chapter 21.

    Looking at the report, it should not surprise us that helgrind says there are no locks protecting the refcount, because it’s true. It uses atomic operations, instead, such as compare-and-swap, that compile down to atomic-increment or load-locked/store-conditional instructions. Seems like helgrind ought to understand those.

  • Jan Niehusmann

    I guess STL uses shared storage for empty strings as an optimization. (Which explains why there is some shared state at all, even if it’s not obvious from the code). Probably, it’s a false positive, but to be sure one would have to check the code in basic_string.h/.tcc.

  • It seems to me to be an optimization of sorts in the STL, if you try to modify the string (add a str+=”a” after the initialization) you no longer get the error reports. Probably STL uses a shared storage for static strings, I tried it with a non-empty static string and the problem persisted, but modifying the string made the problem disappear.

  • Jaymz Julian

    This is stlport vs gnu stl – it doesn’t break in stlport, and does when linked with gnu STL.

  • Jaymz Julian

    (which is to say, in gnu stl. erase() is NOT threadsafe – by implementation! it modifies a “empty” string to have the right metadata, then copies that, in a non-threadsafe way))

  • Jonathan Wakely

    (Russell, you didn’t say which version of GCC or valgrind you’re using.)

    Nathan, looks like Russell has found a real race, although harmless on most platforms.

    When erasing an empty string _M_set_length_and_sharable() is called on the empty Rep object, which is shared by all empty strings. That assigns zero to the refcount, length, and first character. That doesn’t change anything, since they are all zero anyway, so as long as those writes are atomic it should be safe. But helgrind is correct to flag it as a race.

  • Jonathan Wakely

    P.S. If my analysis is right, building GCC with –enable-fully-dynamic-string will avoid this race, as that disables sharing of the empty string representation.

  • etbe

    Julian Seward suggested in private mail that I should re-run the tests with Valgrind 3.4.x.

    Below are some of the results from using valgrind package version 1:3.4.1-1 on Debian/Unstable:

    ==8095== Possible data race during read of size 8 at 0x5340140 by thread #3
    ==8095== at 0x50EAD2A: std::string::erase(unsigned long, unsigned long) (in /usr/lib/libstdc++.so.6.0.12)
    ==8095== by 0x401027: whatever(char const*, char const*) (thread-test.cpp:26)
    ==8095== by 0x4010E0: do_work(void*) (thread-test.cpp:47)
    ==8095== by 0x4C27ABF: mythread_wrapper (hg_intercepts.c:194)
    ==8095== by 0x4E30F79: start_thread (pthread_create.c:300)
    ==8095== by 0x58BCA4C: clone (clone.S:112)
    ==8095== This conflicts with a previous write of size 8 by thread #2
    ==8095== at 0x50EA764: std::string::_M_mutate(unsigned long, unsigned long, unsigned long) (in /usr/lib/libstdc++.so.6.0.12)
    ==8095== by 0x50EAD46: std::string::erase(unsigned long, unsigned long) (in /usr/lib/libstdc++.so.6.0.12)
    ==8095== by 0x401027: whatever(char const*, char const*) (thread-test.cpp:26)
    ==8095== by 0x4010E0: do_work(void*) (thread-test.cpp:47)
    ==8095== by 0x4C27ABF: mythread_wrapper (hg_intercepts.c:194)
    ==8095== by 0x4E30F79: start_thread (pthread_create.c:300)
    ==8095== by 0x58BCA4C: clone (clone.S:112)

    [several more similar to the above snipped]

    ==8095== Possible data race during read of size 8 at 0x5340140 by thread #2
    ==8095== at 0x50EA920: std::string::assign(char const*, unsigned long) (in /usr/lib/libstdc++.so.6.0.12)
    ==8095== by 0x401042: whatever(char const*, char const*) (thread-test.cpp:28)
    ==8095== by 0x4010E0: do_work(void*) (thread-test.cpp:47)
    ==8095== by 0x4C27ABF: mythread_wrapper (hg_intercepts.c:194)
    ==8095== by 0x4E30F79: start_thread (pthread_create.c:300)
    ==8095== by 0x58BCA4C: clone (clone.S:112)
    ==8095==
    ==8095== Possible data race during read of size 8 at 0x5340140 by thread #2
    ==8095== at 0x50EA6E7: std::string::_M_mutate(unsigned long, unsigned long, unsigned long) (in /usr/lib/libstdc++.so.6.0.12)
    ==8095== by 0x50EA8AB: std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) (in /usr/lib/libstdc++.so.6.0.12)
    ==8095== by 0x401042: whatever(char const*, char const*) (thread-test.cpp:28)
    ==8095== by 0x4010E0: do_work(void*) (thread-test.cpp:47)
    ==8095== by 0x4C27ABF: mythread_wrapper (hg_intercepts.c:194)
    ==8095== by 0x4E30F79: start_thread (pthread_create.c:300)
    ==8095== by 0x58BCA4C: clone (clone.S:112)

    http://www.valgrind.org/docs/manual/hg-manual.html#hg-manual.effective-use

    He also recommended reading the above URL. I read it, it seems likely to be useful in the future but didn’t help me this time.

  • etbe

    Florian Weimer made the following suggestion by email:
    this could be a Helgrind bug. std::string should use atomic integer
    operations for synchronization:

    0x00007ffff7949847 : lock xadd %edx,0x10(%rdi)
    0x00007ffff794984c : test %edx,%edx

    Perhaps Helgrind doesn’t recognize them.

    I think there was a plan to switch to a scheme which is not based on
    reference counting, but this will need an incompatible ABI change.

  • etbe

    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=534193

    I’ve filed the above Debian bug report and referenced Jonathan’s GCC bug report as well as this blog post.

    Thanks for all the suggestions! This has really helped me make some progress in debugging my application and possibly improving the general quality of the GCC STL code base.

    I fear that this may not in fact be the nasty bug that is impacting my client’s business. So I may be writing more blog posts and bug reports about thread safety issues in the near future.

    http://etbe.coker.com.au/2009/06/14/finding-thread-unsafe-code/

    My previous post about the shared object I hacked up to expose thread-unsafe library calls may interest some readers. Such a shared object seems necessary because valgrind only finds actual problems not potential problems. If a test data set only results in one thread calling strtok() then valgrind will be happy, but if real-world data results in all threads calling strtok() then things break.

  • Yes, the plan is to switch to a non-reference-counted std::string at the next g++/libstdc++ ABI change, but no such change is planned for now.

    See the bugzilla report for a proposed fix: do not perform the redundant writes if the string uses the shared empty representation.

  • Nathan Myers

    If every thread that writes to the shared empty string writes zeroes, and there is no possibility of anybody reading a non-zero, then why consider it a bug in libstdc++? If anything, it’s a bug in helgrind. Probably the right fix is for helgrind to special-case it, to handle extant library versions cleanly.

    However, avoiding the manifestly redundant writes may be a performance win, instead checking against the address of the shared object, if in fact it’s written to often enough to make a difference. We got some big wins, before, by this sort of change.

    I understand that sharing string storage is often not an optimization on symmetric and NUMA multi-cpu systems, but that’s a separate matter, and numerically most machines are not (e.g. iPhone).

  • Nathan Myers

    p.s. Any program that calls strtok() even once may be flagged as buggy regardless of any thread safety issues. Use of strtok() (or strtok_r()) is a marker not unlike gets() of ill thought out coding.

  • etbe

    Nathan: Yes, I think that avoiding needless writes to shared storage will be a benefit. Some of the code that I’m working on will end up running on machines that have two processors, each of which will have 4 or 6 cores. So avoiding needless writes should be more important than for most programs.

    I agree that use of strtok() is in most cases a bug. It would be handy if there was some better way of flagging it. Maybe an executable flag or something to indicate that it’s OK to use such functions so that the dynamic linker could abort execution when loading a library that is likely to cause problems.

  • etbe

    deb http://www.coker.com.au lenny gcc

    I’ve created an APT repository for Lenny gcc packages with this patch applied, I’ve built it for i386 and AMD64. It’s for my own use but I don’t mind if others use it too.

  • etbe

    Jonathan said in email:

    just a quick note to let you know that data races you found are fixed in GCC 4.5.0, and I also backported them to GCC 4.4 so the fix is included in the 4.4.4 release made a few days ago.