On Thu, Oct 24, 2024 at 12:26:14PM -0400, John B. Wyatt IV wrote: > This patch is also an error report seen on RHEL 9 and Fedora 40. > pi_stress crashes if you run this near full cores with one core > (2 threads) free for housekeeping. The crash usually happens at > around 2 hours of pi_stress running. This issue was reproduced > on a RHEL server with 2 sockets of 10 cores/20 threads (total 40 > threads) and a Fedora server with 2 sockets of 56 cores/112 > threads (total 226 threads). > > The pthread_barrier_wait should guarantee that only one > thread at a time interacts with the variables below that > if block. > > GCC -O2 optimization rearranges the two increments above the wait > function calls. This causes a race issue found by Helgrind. > You can prove this by commenting out the memory barrier > and compiling with `-O0`. The crash does not happen with -O0. > Thank you to Valentin Schneider <vschneid@xxxxxxxxxx> for > suggesting about -O2. > > Add a memory barrier to force GCC to increment the variables > below the pthread calls. The memory barrier prevented any crashes > for several 24 hours tests. This function depends on C11. > > Reported-by: Valgrind's Helgrind tool > > Signed-off-by: "John B. Wyatt IV" <jwyatt@xxxxxxxxxx> > Signed-off-by: "John B. Wyatt IV" <sageofredondo@xxxxxxxxx> > --- > src/pi_tests/pi_stress.c | 22 +++++++++++++++++++++- > 1 file changed, 21 insertions(+), 1 deletion(-) > > diff --git a/src/pi_tests/pi_stress.c b/src/pi_tests/pi_stress.c > index 371e906..37023a1 100644 > --- a/src/pi_tests/pi_stress.c > +++ b/src/pi_tests/pi_stress.c > @@ -44,6 +44,7 @@ > #include <stdint.h> > #include <inttypes.h> > #include <limits.h> > +#include <stdatomic.h> > > #include "rt-sched.h" > #include "rt-utils.h" > @@ -952,12 +953,31 @@ void *high_priority(void *arg) > ("high_priority[%d]: pthread_barrier_wait(finish): %x", p->id, status); > return NULL; > } > + > + > + /** > + * The pthread_barrier_wait should guarantee that only one > + * thread at a time interacts with the variables below that > + * if block. > + * > + * GCC -O2 rearranges the two increments above the wait > + * function calls causing a race issue if you run this > + * near full cores with one core (2 threads) free for > + * housekeeping. This causes a crash at around 2 hour of > + * running. You can prove this by commenting out the barrier > + * and compiling with `-O0`. The crash does not show with > + * -O0. > + * > + * Add a memory barrier to force GCC to increment the variables > + * below the pthread calls. This funcion depends on C11. > + **/ I agree with Crystal. AFAIU, the compiler should not reorder before pthread_barrier_wait(). Something odd is going on here. However, I am not familiar with the code base. What's the repo for this? Btw, did you look at the generated assembly code? > + atomic_thread_fence(memory_order_seq_cst); > + If the problem is program order, a simple compiler barrier with acquire semantics should work: atomic_signal_fence(memory_order_acquire); > /* update the group stats */ > p->total++; > > /* update the watchdog counter */ > p->watchdog++; > - > } > set_shutdown_flag(); > > -- > 2.47.0 >