racy leak sanitizer builds, was Re: [PATCH 0/9] commit-reach: -Wsign-compare follow-ups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Dec 28, 2024 at 09:27:41AM +0100, Patrick Steinhardt wrote:

>   - The leak-checking jobs fail quite regularly in t0003 with something
>     that feels like either a race caused by a leak or an issue with the
>     sanitizer itself [2]:
> 
>     ==git==17055==ERROR: LeakSanitizer: detected memory leaks
>     Direct leak of 32 byte(s) in 1 object(s) allocated from:
>     #0 0x7aa0d03c7713 in realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
>     #1 0x7aa0d0221f69 in pthread_getattr_np (/lib/x86_64-linux-gnu/libc.so.6+0x9df69) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
>     #2 0x7aa0d03d9544 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
>     #3 0x7aa0d03d96fa in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:614
>     #4 0x7aa0d03cb2b9 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:53
>     #5 0x7aa0d03c756a in ThreadStartFunc<false> ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:431
>     #6 0x7aa0d0220a93  (/lib/x86_64-linux-gnu/libc.so.6+0x9ca93) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
>     #7 0x7aa0d02adc3b  (/lib/x86_64-linux-gnu/libc.so.6+0x129c3b) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
>     DEDUP_TOKEN: ___interceptor_realloc--pthread_getattr_np--__sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*)--__sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*)--__lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType)--ThreadStartFunc<false>----
>     SUMMARY: LeakSanitizer: 32 byte(s) leaked in 1 allocation(s).

I did a bit of digging on this last week, but didn't come up with a very
satisfactory solution. You can reproduce easily by building with
SANITIZE=leak and running t0003 with --stress.

It's the same issue we tried to address in 993d38a066 (index-pack: spawn
threads atomically, 2024-01-05): one thread calls exit() and takes down
the process while other threads are still being spawned (so the leak
checker runs while those other threads are in a weird state where lsan
has allocated some memory but not yet set up the thread stack as a place
to look for reachable memory).

There we dealt with it by making sure no thread started work (and thus
hit the exit call) until all of them were sanitized. I tried doing
something similar here, like:

  diff --git a/builtin/grep.c b/builtin/grep.c
  index 98b85c7fca..866645e6f8 100644
  --- a/builtin/grep.c
  +++ b/builtin/grep.c
  @@ -233,18 +233,20 @@ static void start_threads(struct grep_opt *opt)
                  strbuf_init(&todo[i].out, 0);
          }

  +       grep_lock();
          CALLOC_ARRAY(threads, num_threads);
          for (i = 0; i < num_threads; i++) {
                  int err;
                  struct grep_opt *o = grep_opt_dup(opt);
                  o->output = strbuf_out;
                  compile_grep_patterns(o);
                  err = pthread_create(&threads[i], NULL, run, o);

                  if (err)
                          die(_("grep: failed to create thread: %s"),
                              strerror(err));
          }
  +       grep_unlock();
   }

   static int wait_all(void)

but it doesn't work. In fact, this does nothing at all because each
thread will start by looking for work to do via get_work(), and we do
not call add_work() to give them anything to do until all threads are
spawned.

So I suspect the race is actually trickier, and that the "weird state"
is not something that happens just while pthread_create() is being
called, but is actually running _in the thread itself_. So even though
pthread_create() has returned for each thread, they are still setting
themselves up before running.

It mostly worked in 993d38a066 because index-pack has to do more work to
get to the exit() call. So delaying the start of the threads was enough
to usually win the race. But here, the individual grep threads get to
the exit() call very quickly, and it's not enough.

Which would mean that 993d38a066 is not actually a full fix, either. And
indeed, I can get t5309 to fail even with that patch (which is weird,
because that was how I tested it originally; I wonder if anything on the
LSan side changed?).

So a full fix would actually require synchronization where we spawn each
thread, then wait for all of them to hit the barrier to declare
themselves ready, and then let them all start running. There is a
pthread_barrier type that would help with this, but we've never used it
before (so we'd probably need to at least provide a Windows compat
layer).

One quick workaround is this:

diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index 3c98b622f2..7ecaf8f4e3 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -443,7 +443,7 @@ test_expect_success 'diff without repository with attr source' '
 		cat >expect <<-EOF &&
 		fatal: cannot use --attr-source or GIT_ATTR_SOURCE without repo
 		EOF
-		test_must_fail env GIT_ATTR_SOURCE=HEAD git grep --no-index foo file 2>err &&
+		test_must_fail env GIT_ATTR_SOURCE=HEAD git grep --threads=1 --no-index foo file 2>err &&
 		test_cmp expect err
 	)
 '

Or you could even imagine automatically forcing online_cpus() to "1" for
LSan builds, which would fix it everywhere. But then we'd miss any leaks
that are specific to the threaded code.

Or of course we could try to engage with LSan folks about whether this
can be fixed there. I don't think we've reported it anywhere there.

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux