Re: [PATCH 3/5] thread-utils: introduce optional barrier type

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Dec 29, 2024 at 11:28:30PM -0500, Jeff King wrote:
> One thread primitive we don't yet support is a barrier: it waits for all
> threads to reach a synchronization point before letting any of them
> continue. This would be useful for avoiding the LSan race we see in
> index-pack (and other places) by having all threads complete their
> initialization before any of them start to do real work.
> 
> POSIX introduced a pthread_barrier_t in 2004, which does what we want.
> But if we want to rely on it:
> 
>   1. Our Windows pthread emulation would need a new set of wrapper
>      functions. There's a Synchronization Barrier primitive there, which
>      was introduced in Windows 8 (which is old enough for us to depend
>      on).
> 
>   2. macOS (and possibly other systems) has pthreads but not
>      pthread_barrier_t. So there we'd have to implement our own barrier
>      based on the mutex and cond primitives.
> 
> Those are do-able, but since we only care about avoiding races in our
> LSan builds, there's an easier way: make it a noop on systems without a
> native pthread barrier.

I think this is fine for a first iteration. If we ever feel the need for
having barriers anywhere else for actual correctness we can iterate on
the solution and provide wrappers for those platforms.

> This patch introduces a "maybe_thread_barrier" API. The clunky name
> (rather than just using pthread_barrier directly) should hopefully clue
> people in that on some systems it will do nothing. It's wired to a
> Makefile knob which has to be triggered manually, and we enable it for
> the linux-leaks CI jobs (since we know we'll have it there).
> 
> There are some other possible options:
> 
>   - we could turn it on all the time for Linux systems based on uname.

Tiniest of nits: these are all full sentences, which should start with
an upper-case letter.

>     But we really only care about it for LSan builds, and there is no
>     need to add extra code to regular builds.
> 
>   - we could turn it on only for LSan builds. But that would break
>     builds on non-Linux platforms (like macOS) that otherwise should
>     support sanitizers.

Mh. I'm not a huge fan of having extra code for just a subset of our
builds and think that having the code generally enabled on platforms
that support it is preferable to reduce the number of build variants.
But...

>   - we could trigger only on the combination of Linux and LSan together.
>     This isn't too hard to do, but the uname check isn't completely
>     accurate. It is really about what your libc supports, and non-glibc
>     systems might not have it (though at least musl seems to).
> 
>     So we'd risk breaking builds on those systems, which would need to
>     add a new knob. Though the upside would be that running local "make
>     SANITIZE=leak test" would be protected automatically.

... this is a fair remark. So I dunno.

> And of course none of this protects LSan runs from races on systems
> without pthread barriers. It's probably OK in practice to protect only
> our CI jobs, though. The race is rare-ish and most leak-checking happens
> through CI.
> 
> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
>  Makefile       |  7 +++++++
>  ci/lib.sh      |  1 +
>  thread-utils.h | 17 +++++++++++++++++
>  3 files changed, 25 insertions(+)
> 
> diff --git a/Makefile b/Makefile
> index 97e8385b66..2c6dad8a75 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -141,6 +141,10 @@ include shared.mak
>  #
>  # Define NO_PTHREADS if you do not have or do not want to use Pthreads.
>  #
> +# Define THREAD_BARRIER_PTHREAD if your system has pthread_barrier_t. Barrier
> +# support is optional and is only helpful when building with SANITIZE=leak, as
> +# it is used to eliminate some races in the leak-checker.
> +#
>  # Define NO_PREAD if you have a problem with pread() system call (e.g.
>  # cygwin1.dll before v1.5.22).
>  #
> @@ -2079,6 +2083,9 @@ ifdef NO_PTHREADS
>  else
>  	BASIC_CFLAGS += $(PTHREAD_CFLAGS)
>  	EXTLIBS += $(PTHREAD_LIBS)
> +	ifdef THREAD_BARRIER_PTHREAD
> +		BASIC_CFLAGS += -DTHREAD_BARRIER_PTHREAD
> +	endif
>  endif
>  
>  ifdef HAVE_PATHS_H

Okay. The Meson equivalent would be:

diff --git a/meson.build b/meson.build
index a0654a3f24..db4c1e6929 100644
--- a/meson.build
+++ b/meson.build
@@ -788,6 +788,10 @@ threads = dependency('threads', required: false)
 if threads.found()
   libgit_dependencies += threads
   build_options_config.set('NO_PTHREADS', '')
+
+  if get_option('b_sanitize').contains('leak') and compiler.has_function('pthread_barrier_init', dependencies: threads)
+    libgit_c_args += '-DTHREAD_BARRIER_PTHREAD'
+  endif
 else
   libgit_c_args += '-DNO_PTHREADS'
   build_options_config.set('NO_PTHREADS', '1')

> diff --git a/thread-utils.h b/thread-utils.h
> index 4961487ed9..3df5be9916 100644
> --- a/thread-utils.h
> +++ b/thread-utils.h
> @@ -53,5 +53,22 @@ int dummy_pthread_init(void *);
>  int online_cpus(void);
>  int init_recursive_mutex(pthread_mutex_t*);
>  
> +#ifdef THREAD_BARRIER_PTHREAD
> +#define maybe_thread_barrier_t pthread_barrier_t
> +#define maybe_thread_barrier_init pthread_barrier_init
> +#define maybe_thread_barrier_wait pthread_barrier_wait
> +#define maybe_thread_barrier_destroy pthread_barrier_destroy
> +#else
> +#define maybe_thread_barrier_t int

Out of curiosity: why did you pick a define here and not a typedef?

> +static inline int maybe_thread_barrier_init(maybe_thread_barrier_t *b UNUSED,
> +					    void *attr UNUSED,
> +					    unsigned nr UNUSED)
> +{
> +	errno = ENOSYS;
> +	return -1;
> +}
> +#define maybe_thread_barrier_wait(barrier)
> +#define maybe_thread_barrier_destroy(barrier)

So the way these wrappers are implemented it is not possible to check
for errors of `pthread_barrier_init()` et al. When the implementation
exists we do have return codes, but if it's stubbed out we don't.

I think we should align these two implementations so that it does become
possible to check for errors, or otherwise we wouldn't be using the
pthread APIs correctly. It does raise the question though whether we
should really return `-1` in the stubbed-out variant or whether we
should instead pretend as if things were alright.

An alternative would be to die in case the pthread-functions return an
error.

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux