Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make xchg and cmpxchg a full barrier

Boqun Feng <boqun.feng@xxxxxxxxx> · Thu, 15 Oct 2015 11:11:01 +0800

Hi Paul,

On Thu, Oct 15, 2015 at 08:53:21AM +0800, Boqun Feng wrote:
> On Wed, Oct 14, 2015 at 02:44:53PM -0700, Paul E. McKenney wrote:
[snip]
> > To that end, the herd tool can make a diagram of what it thought
> > happened, and I have attached it.  I used this diagram to try and force
> > this scenario at https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html#PPC,
> > and succeeded.  Here is the sequence of events:
> > 
> > o	Commit P0's write.  The model offers to propagate this write
> > 	to the coherence point and to P1, but don't do so yet.
> > 
> > o	Commit P1's write.  Similar offers, but don't take them up yet.
> > 
> > o	Commit P0's lwsync.
> > 
> > o	Execute P0's lwarx, which reads a=0.  Then commit it.
> > 
> > o	Commit P0's stwcx. as successful.  This stores a=1.
> > 
> > o	Commit P0's branch (not taken).
> > 
> 
> So at this point, P0's write to 'a' has propagated to P1, right? But
> P0's write to 'x' hasn't, even there is a lwsync between them, right?
> Doesn't the lwsync prevent this from happening?
> 
> If at this point P0's write to 'a' hasn't propagated then when?
> 

Hmm.. I played around ppcmem, and figured out what happens to
propagation of P0's write to 'a':

At this point, or some point after store 'a' to 1 and before sync on
P1 finish, writes to 'a' reachs a coherence point which 'a' is 2, so
P0's write to 'a' "fails" and will not propagate.

I probably misunderstood the word "propagate", which actually means an
already coherent write gets seen by another CPU, right?

So my question should be:

As lwsync can order P0's write to 'a' happens after P0's write to 'x',
why P0's write to 'x' isn't seen by P1 after P1's write to 'a' overrides
P0's?

But ppcmem gave me the answer ;-) lwsync won't wait under P0's write to
'x' gets propagated, and if P0's write to 'a' "wins" in write coherence,
lwsync will guarantee propagation of 'x' happens before that of 'a', but
if P0's write to 'a' "fails", there will be no propagation of 'a' from
P0. So that lwsync can't do anything here.

Regards,
Boqun

> 
> > o	Commit P0's final register-to-register move.
> > 
> > o	Commit P1's sync instruction.
> > 
> > o	There is now nothing that can happen in either processor.
> > 	P0 is done, and P1 is waiting for its sync.  Therefore,
> > 	propagate P1's a=2 write to the coherence point and to
> > 	the other thread.
> > 
> > o	There is still nothing that can happen in either processor.
> > 	So pick the barrier propagate, then the acknowledge sync.
> > 
> > o	P1 can now execute its read from x.  Because P0's write to
> > 	x is still waiting to propagate to P1, this still reads
> > 	x=0.  Execute and commit, and we now have both r3 registers
> > 	equal to zero and the final value a=2.
> > 
> > o	Clean up by propagating the write to x everywhere, and
> > 	propagating the lwsync.
> > 
> > And the "exists" clause really does trigger: 0:r3=0; 1:r3=0; [a]=2;
> > 
> > I am still not 100% confident of my litmus test.  It is quite possible
> > that I lost something in translation, but that is looking less likely.
> > 
Attachment:
signature.asc

Description: PGP signature

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make xchg and cmpxchg a full barrier