On Tue, Nov 26, 2013 at 10:51:36PM +0000, Paul E. McKenney wrote: > On Tue, Nov 26, 2013 at 11:32:25AM -0800, Linus Torvalds wrote: > > On Tue, Nov 26, 2013 at 11:20 AM, Paul E. McKenney > > <paulmck@xxxxxxxxxxxxxxxxxx> wrote: > > > > > > There are several places in RCU that assume unlock+lock is a full > > > memory barrier, but I would be more than happy to fix them up given > > > an smp_mb__after_spinlock() and an smp_mb__before_spinunlock(), or > > > something similar. > > > > A "before_spinunlock" would actually be expensive on x86. > > Good point, on x86 the typical non-queued spin-lock acquisition path > has an atomic operation with full memory barrier in any case. I believe > that this is the case for the other TSO architectures. For the non-TSO > architectures: > > o ARM has an smp_mb() during lock acquisition, so after_spinlock() > can be a no-op for them. Ok, but what about arm64? We use acquire for lock() and release for unlock(), so in Linus' example: write A; spin_lock() mb__after_spinlock(); read B Then A could very well be reordered after B if mb__after_spinlock() is a nop. Making that a full barrier kind of defeats the point of using acquire in the first place... It's one thing ordering unlock -> lock, but another getting those two to behave as full barriers for any arbitrary memory accesses. Will -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>