On Wed, Nov 27, 2013 at 05:11:43PM +0000, Paul E. McKenney wrote: > On Wed, Nov 27, 2013 at 10:16:13AM +0000, Will Deacon wrote: > > On Tue, Nov 26, 2013 at 10:51:36PM +0000, Paul E. McKenney wrote: > > > On Tue, Nov 26, 2013 at 11:32:25AM -0800, Linus Torvalds wrote: > > > > On Tue, Nov 26, 2013 at 11:20 AM, Paul E. McKenney wrote: > > > o ARM has an smp_mb() during lock acquisition, so after_spinlock() > > > can be a no-op for them. > > > > Ok, but what about arm64? We use acquire for lock() and release for > > unlock(), so in Linus' example: > > Right, I did forget the arm vs. arm64 split! > > > write A; > > spin_lock() > > mb__after_spinlock(); > > read B > > > > Then A could very well be reordered after B if mb__after_spinlock() is a nop. > > Making that a full barrier kind of defeats the point of using acquire in the > > first place... > > The trick is that you don't have mb__after_spinlock() unless you need the > ordering, which we expect in a small minority of the lock acquisitions. > So you would normally get the benefit of acquire/release efficiency. Ok, understood. I take it this means that you don't care about ordering the write to A with the actual locking operation? (that would require the mb to be *inside* the spin_lock() implementation). > > It's one thing ordering unlock -> lock, but another getting those two to > > behave as full barriers for any arbitrary memory accesses. > > And in fact the unlock+lock barrier is all that RCU needs. I guess the > question is whether it is worth having two flavors of __after_spinlock(), > one that is a full barrier with just the lock, and another that is > only guaranteed to be a full barrier with unlock+lock. I think it's worth distinguishing those cases because, in my mind, one is potentially a lot heavier than the other. The risk is that we end up producing a set of strangely named barrier abstractions that nobody can figure out how to use properly: /* * Prevent re-ordering of arbitrary accesses across spin_lock and * spin_unlock. */ mb__after_spin_lock() mb__after_spin_unlock() /* * Order spin_lock() vs spin_unlock() */ mb__between_spin_unlock_lock() /* Horrible name! */ We could potentially replace the first set of barriers with spin_lock_mb() and spin_unlock_mb() variants (which would be more efficient than half barrier + full barrier), then we only end up with strangely named barrier which applies to the non _mb() spinlock routines. Will -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>