On Tue, Nov 26, 2013 at 11:32:25AM -0800, Linus Torvalds wrote: > On Tue, Nov 26, 2013 at 11:20 AM, Paul E. McKenney > <paulmck@xxxxxxxxxxxxxxxxxx> wrote: > > > > There are several places in RCU that assume unlock+lock is a full > > memory barrier, but I would be more than happy to fix them up given > > an smp_mb__after_spinlock() and an smp_mb__before_spinunlock(), or > > something similar. > > A "before_spinunlock" would actually be expensive on x86. Good point, on x86 the typical non-queued spin-lock acquisition path has an atomic operation with full memory barrier in any case. I believe that this is the case for the other TSO architectures. For the non-TSO architectures: o ARM has an smp_mb() during lock acquisition, so after_spinlock() can be a no-op for them. o Itanium will require more thought, but it looks like it doesn't care between after_spinlock() and before_spinunlock(). I have to defer to the maintainrs. o PowerPC is OK either way. > So I'd *much* rather see the "after_spinlock()" version, if that is > sufficient for all users. And it should be, since that's the > traditional x86 behavior that we had before the MCS lock discussion. > > Because it's worth noting that a spin_lock() is still a full memory > barrier on x86, even with the MCS code, *assuming it is done in the > context of the thread needing the memory barrier". And I suspect that > is much more generally true than just x86. It's the final MCS hand-off > of a lock that is pretty weak with just a local read. The full lock > sequence is always going to be much stronger, if only because it will > contain a write somewhere shared as well. Good points, and after_spinlock() works for me from an RCU perspective. Thanx, Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>