On Tue, Nov 26, 2013 at 11:00:50AM -0800, Linus Torvalds wrote: > On Tue, Nov 26, 2013 at 1:59 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > If you now want to weaken this definition, then that needs consideration > > because we actually rely on things like > > > > spin_unlock(l1); > > spin_lock(l2); > > > > being full barriers. > > Btw, maybe we should just stop that assumption. The complexity of this > discussion makes me go "maybe we should stop with subtle assumptions > that happen to be obviously true on x86 due to historical > implementations, but aren't obviously true even *there* any more with > the MCS lock". >From an RCU viewpoint, I am OK with that approach. From the viewpoint of documenting our assumptions, I really really like that approach. > We already have a concept of > > smp_mb__before_spinlock(); > spin_lock(): > > for sequences where we *know* we need to make getting a spin-lock be a > full memory barrier. It's free on x86 (and remains so even with the > MCS lock, regardless of any subtle issues, if only because even the > MCS lock starts out with a locked atomic, never mind the contention > slow-case). Of course, that macro is only used inside the scheduler, > and is actually documented to not really be a full memory barrier, but > it handles the case we actually care about. This would work well if we made it be smp_mb__after_spinlock(), used as follows: spin_lock(); smp_mb__after_spinlock(); The reason that it must go after rather than before is to handle the MCS-style low-overhead handoffs. During the handoff, you can count on code at the beginning of the critical section being executed, but things before the lock cannot possibly help you. We should also have something for lock releases, for example: smp_mb__before_spinunlock(); spin_unlock(); This allows architectures to choose where to put the overhead, and also very clearly documents which unlock+lock pairs need the full barriers. Heh. Must smp_mb__after_spinlock() and smp_mb__before_spinunlock() provide a full barrier when used separately, or only when used together? The unlock+lock guarantee requires that they provide a full barrier only when used together. However, I believe that the scheduler's use of smp_mb__before_spinlock() needs a full barrier without pairing. I have no idea whether or not we should have a separate API for each flavor of lock. > IOW, where do we really care about the "unlock+lock" is a memory > barrier? And could we make those places explicit, and then do > something similar to the above to them? There are several places in RCU that assume unlock+lock is a full memory barrier, but I would be more than happy to fix them up given an smp_mb__after_spinlock() and an smp_mb__before_spinunlock(), or something similar. Thanx, Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>