Re: [PATCH v6 4/5] MCS Lock: Barrier corrections

"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> · Thu, 21 Nov 2013 14:52:08 -0800

On Thu, Nov 21, 2013 at 02:27:01PM -0800, Linus Torvalds wrote:
> On Wed, Nov 20, 2013 at 8:53 PM, Paul E. McKenney
> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > The other option is to weaken lock semantics so that unlock-lock no
> > longer implies a full barrier, but I believe that we would regret taking
> > that path.  (It would be OK by me, I would just add a few smp_mb()
> > calls on various slowpaths in RCU.  But...)
> 
> Hmm. I *thought* we already did that, exactly because some
> architecture already hit this issue, and we got rid of some of the
> more subtle "this works because.."
> 
> No?
> 
> Anyway, isn't "unlock+lock" fundamentally guaranteed to be a memory
> barrier? Anything before the unlock cannot possibly migrate down below
> the unlock, and anything after the lock must not possibly migrate up
> to before the lock? If either of those happens, then something has
> migrated out of the critical region, which is against the whole point
> of locking..

Actually, the weakest forms of locking only guarantee a consistent view
of memory if you are actually holding the lock.  Not "a" lock, but "the"
lock.  The trick is that use of a common lock variable short-circuits
the transitivity that would otherwise be required, which in turn
allows cheaper memory barriers to be used.  But when implementing these
weakest forms of locking (which Peter and Tim inadvertently did with the
combination of MCS lock and a PPC implementation of smp_load_acquire()
and smp_store_release() that used lwsync), then "unlock+lock" is no
longer guaranteed to be a memory barrier.

Which is why I (admittedly belatedly) complained.

So the three fixes I know of at the moment are:

1.	Upgrade smp_store_release()'s PPC implementation from lwsync
	to sync.

	What about ARM?  ARM platforms that have the load-acquire and
	store-release instructions could use them, but other ARM
	platforms have to use dmb.  ARM avoids PPC's lwsync issue
	because it has no equivalent to lwsync.

2.	Place an explicit smp_mb() into the MCS-lock queued handoff
	code.

3.	Remove the requirement that "unlock+lock" be a full memory
	barrier.

We have been leaning towards #1, but before making any hard decision
on this we are looking more closely at what the situation is on other
architectures.

> It's the "lock+unlock" where it's possible that something before the
> lock might migrate *into* the critical region (ie after the lock), and
> something after the unlock might similarly migrate to precede the
> unlock, so you could end up having out-of-order accesses across a
> lock/unlock sequence (that both happen "inside" the lock, but there is
> no guaranteed ordering between the two accesses themselves).

Agreed.

> Or am I confused? The one major reason for strong memory ordering is
> that weak ordering is too f*cking easy to get wrong on a software
> level, and even people who know about it will make mistakes.

Guilty to charges as read!  ;-)

That is a major reason why I am leaning towards #1 on the list above.

							Thanx, Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>