On Tue, Sep 13, 2022 at 08:21:02AM -0400, Dan Lustig wrote: > On 9/13/2022 7:24 AM, Will Deacon wrote: > > On Fri, Aug 26, 2022 at 01:42:19PM -0700, Paul E. McKenney wrote: > >> PPC MP+lwsyncs+atomic > >> "LwSyncdWW Rfe LwSyncdRR Fre" > >> Cycle=Rfe LwSyncdRR Fre LwSyncdWW > >> { > >> 0:r2=x; 0:r4=y; > >> 1:r2=y; 1:r5=2; > >> 2:r2=y; 2:r4=x; > >> } > >> P0 | P1 | P2 ; > >> li r1,1 | lwarx r1,r0,r2 | lwz r1,0(r2) ; > >> stw r1,0(r2) | stwcx. r5,r0,r2 | lwsync ; > >> lwsync | | lwz r3,0(r4) ; > >> li r3,1 | | ; > >> stw r3,0(r4) | | ; > >> exists (1:r1=1 /\ 2:r1=2 /\ 2:r3=0) > > > > Just catching up on this, but one possible gotcha here is if you have an > > architecture with native load-acquire on P2 and then you move P2 to the end > > of P1. e.g. at a high-level: > > > > > > P0 P1 > > Wx = 1 RmW(y) // xchg() 1 => 2 > > WyRel = 1 RyAcq = 2 > > Rx = 0 > > > > arm64 forbids this, but it's not "natural" to the hardware and I don't > > know what e.g. risc-v would say about it. > > > > RISC-V doesn't currently have native load-acquire instructions other than > load-reserve-acquire, but if it did, it would forbid this outcome as well. Thanks for chipping in, Dan. Somehow, I hadn't realised that you didn't have native load-acquire instructions. Will