Re: "Verifying and Optimizing Compact NUMA-Aware Locks on Weak Memory Models"

Will Deacon <will@xxxxxxxxxx> · Fri, 16 Sep 2022 09:18:54 +0100

On Tue, Sep 13, 2022 at 08:21:02AM -0400, Dan Lustig wrote:
> On 9/13/2022 7:24 AM, Will Deacon wrote:
> > On Fri, Aug 26, 2022 at 01:42:19PM -0700, Paul E. McKenney wrote:
> >> PPC MP+lwsyncs+atomic
> >> "LwSyncdWW Rfe LwSyncdRR Fre"
> >> Cycle=Rfe LwSyncdRR Fre LwSyncdWW
> >> {
> >> 0:r2=x; 0:r4=y;
> >> 1:r2=y; 1:r5=2;
> >> 2:r2=y; 2:r4=x;
> >> }
> >>  P0           | P1              | P2           ;
> >>  li r1,1      | lwarx r1,r0,r2  | lwz r1,0(r2) ;
> >>  stw r1,0(r2) | stwcx. r5,r0,r2 | lwsync       ;
> >>  lwsync       |                 | lwz r3,0(r4) ;
> >>  li r3,1      |                 |              ;
> >>  stw r3,0(r4) |                 |              ;
> >> exists (1:r1=1 /\ 2:r1=2 /\ 2:r3=0)
> > 
> > Just catching up on this, but one possible gotcha here is if you have an
> > architecture with native load-acquire on P2 and then you move P2 to the end
> > of P1. e.g. at a high-level:
> > 
> > 
> >   P0		P1
> >   Wx = 1	RmW(y) // xchg() 1 => 2
> >   WyRel = 1	RyAcq = 2
> > 		Rx = 0
> > 
> > arm64 forbids this, but it's not "natural" to the hardware and I don't
> > know what e.g. risc-v would say about it.
> > 
> 
> RISC-V doesn't currently have native load-acquire instructions other than
> load-reserve-acquire, but if it did, it would forbid this outcome as well.

Thanks for chipping in, Dan. Somehow, I hadn't realised that you didn't
have native load-acquire instructions.

Will