Re: [PATCH] pi_stress: Add memory barrier to resolve crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Tue, 29 Oct 2024, Wander Lairson Costa wrote:

> On Thu, Oct 24, 2024 at 12:26:14PM -0400, John B. Wyatt IV wrote:
> > This patch is also an error report seen on RHEL 9 and Fedora 40.
> > pi_stress crashes if you run this near full cores with one core
> > (2 threads) free for housekeeping. The crash usually happens at
> > around 2 hours of pi_stress running. This issue was reproduced
> > on a RHEL server with 2 sockets of 10 cores/20 threads (total 40
> > threads) and a Fedora server with 2 sockets of 56 cores/112
> > threads (total 226 threads).
> > 
> > The pthread_barrier_wait should guarantee that only one
> > thread at a time interacts with the variables below that
> > if block.
> > 
> > GCC -O2 optimization rearranges the two increments above the wait
> > function calls. This causes a race issue found by Helgrind.
> > You can prove this by commenting out the memory barrier
> > and compiling with `-O0`. The crash does not happen with -O0.
> > Thank you to Valentin Schneider <vschneid@xxxxxxxxxx> for
> > suggesting about -O2.
> > 
> > Add a memory barrier to force GCC to increment the variables
> > below the pthread calls. The memory barrier prevented any crashes
> > for several 24 hours tests. This function depends on C11.
> > 
> > Reported-by: Valgrind's Helgrind tool
> > 
> > Signed-off-by: "John B. Wyatt IV" <jwyatt@xxxxxxxxxx>
> > Signed-off-by: "John B. Wyatt IV" <sageofredondo@xxxxxxxxx>
> > ---
> >  src/pi_tests/pi_stress.c | 22 +++++++++++++++++++++-
> >  1 file changed, 21 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/pi_tests/pi_stress.c b/src/pi_tests/pi_stress.c
> > index 371e906..37023a1 100644
> > --- a/src/pi_tests/pi_stress.c
> > +++ b/src/pi_tests/pi_stress.c
> > @@ -44,6 +44,7 @@
> >  #include <stdint.h>
> >  #include <inttypes.h>
> >  #include <limits.h>
> > +#include <stdatomic.h>
> >  
> >  #include "rt-sched.h"
> >  #include "rt-utils.h"
> > @@ -952,12 +953,31 @@ void *high_priority(void *arg)
> >  			    ("high_priority[%d]: pthread_barrier_wait(finish): %x", p->id, status);
> >  			return NULL;
> >  		}
> > +
> > +
> > +		/**
> > +		 * The pthread_barrier_wait should guarantee that only one
> > +		 * thread at a time interacts with the variables below that
> > +		 * if block.
> > +		 *
> > +		 * GCC -O2 rearranges the two increments above the wait
> > +		 * function calls causing a race issue if you run this
> > +		 * near full cores with one core (2 threads) free for
> > +		 * housekeeping. This causes a crash at around 2 hour of
> > +		 * running. You can prove this by commenting out the barrier
> > +		 * and compiling with `-O0`. The crash does not show with
> > +		 * -O0.
> > +		 *
> > +		 * Add a memory barrier to force GCC to increment the variables
> > +		 * below the pthread calls. This funcion depends on C11.
> > +		 **/
> 
> I agree with Crystal. AFAIU, the compiler should not reorder before
> pthread_barrier_wait(). Something odd is going on here. However, I am
> not familiar with the code base. What's the repo for this?

This is one of the programs in the rt-tests suite

Clone one of the following
git://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git
https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git
https://kernel.googlesource.com/pub/scm/utils/rt-tests/rt-tests.git


> 
> Btw, did you look at the generated assembly code?
> 
> > +		atomic_thread_fence(memory_order_seq_cst);
> > +
> 
> If the problem is program order, a simple compiler barrier with acquire
> semantics should work:
> 
>         atomic_signal_fence(memory_order_acquire);
> 
> >  		/* update the group stats */
> >  		p->total++;
> >  
> >  		/* update the watchdog counter */
> >  		p->watchdog++;
> > -
> >  	}
> >  	set_shutdown_flag();
> >  
> > -- 
> > 2.47.0
> > 
> 
> 
> 





[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux