Re: [PATCH v6 4/5] MCS Lock: Barrier corrections

"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> · Thu, 28 Nov 2013 09:38:53 -0800

On Thu, Nov 28, 2013 at 11:40:59AM +0000, Will Deacon wrote:
> On Wed, Nov 27, 2013 at 05:11:43PM +0000, Paul E. McKenney wrote:
> > On Wed, Nov 27, 2013 at 10:16:13AM +0000, Will Deacon wrote:
> > > On Tue, Nov 26, 2013 at 10:51:36PM +0000, Paul E. McKenney wrote:
> > > > On Tue, Nov 26, 2013 at 11:32:25AM -0800, Linus Torvalds wrote:
> > > > > On Tue, Nov 26, 2013 at 11:20 AM, Paul E. McKenney wrote:
> > > > o	ARM has an smp_mb() during lock acquisition, so after_spinlock()
> > > > 	can be a no-op for them.
> > > 
> > > Ok, but what about arm64? We use acquire for lock() and release for
> > > unlock(), so in Linus' example:
> > 
> > Right, I did forget the arm vs. arm64 split!
> > 
> > >     write A;
> > >     spin_lock()
> > >     mb__after_spinlock();
> > >     read B
> > > 
> > > Then A could very well be reordered after B if mb__after_spinlock() is a nop.
> > > Making that a full barrier kind of defeats the point of using acquire in the
> > > first place...
> > 
> > The trick is that you don't have mb__after_spinlock() unless you need the
> > ordering, which we expect in a small minority of the lock acquisitions.
> > So you would normally get the benefit of acquire/release efficiency.
> 
> Ok, understood. I take it this means that you don't care about ordering the
> write to A with the actual locking operation? (that would require the mb to
> be *inside* the spin_lock() implementation).

Or it would require an mb__before_spinlock().  More on this below...

> > > It's one thing ordering unlock -> lock, but another getting those two to
> > > behave as full barriers for any arbitrary memory accesses.
> > 
> > And in fact the unlock+lock barrier is all that RCU needs.  I guess the
> > question is whether it is worth having two flavors of __after_spinlock(),
> > one that is a full barrier with just the lock, and another that is
> > only guaranteed to be a full barrier with unlock+lock.
> 
> I think it's worth distinguishing those cases because, in my mind, one is
> potentially a lot heavier than the other. The risk is that we end up
> producing a set of strangely named barrier abstractions that nobody can
> figure out how to use properly:
> 
> 
> 	/*
> 	 * Prevent re-ordering of arbitrary accesses across spin_lock and
> 	 * spin_unlock.
> 	 */
> 	mb__after_spin_lock()
> 	mb__after_spin_unlock()
> 
> 	/*
> 	 * Order spin_lock() vs spin_unlock()
> 	 */
> 	mb__between_spin_unlock_lock() /* Horrible name! */
> 
> 
> We could potentially replace the first set of barriers with spin_lock_mb()
> and spin_unlock_mb() variants (which would be more efficient than half
> barrier + full barrier), then we only end up with strangely named barrier
> which applies to the non _mb() spinlock routines.

How about the current mb__before_spinlock() making the acquisition be
a full barrier, and an mb_after_spinlock() making a prior release plus
this acquisition be a full barrier?

Yes, we might need better names, but I believe that this approach does
what you need.

Thoughts?

							Thanx, Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>