On Fri, Oct 09, 2015 at 10:31:38AM +0200, Peter Zijlstra wrote: > On Thu, Oct 08, 2015 at 02:44:39PM -0700, Paul E. McKenney wrote: > > On Thu, Oct 08, 2015 at 01:16:38PM +0200, Peter Zijlstra wrote: > > > On Thu, Oct 08, 2015 at 02:50:36PM +1100, Michael Ellerman wrote: > > > > On Wed, 2015-10-07 at 08:25 -0700, Paul E. McKenney wrote: > > > > > > > > Currently, we do need smp_mb__after_unlock_lock() to be after the > > > > > acquisition on PPC -- putting it between the unlock and the lock > > > > > of course doesn't cut it for the cross-thread unlock/lock case. > > > > > > This ^, that makes me think I don't understand > > > smp_mb__after_unlock_lock. > > > > > > How is: > > > > > > UNLOCK x > > > smp_mb__after_unlock_lock() > > > LOCK y > > > > > > a problem? That's still a full barrier. > > > > The problem is that I need smp_mb__after_unlock_lock() to give me > > transitivity even if the UNLOCK happened on one CPU and the LOCK > > on another. For that to work, the smp_mb__after_unlock_lock() needs > > to be either immediately after the acquire (the current choice) or > > immediately before the release (which would also work from a purely > > technical viewpoint, but I much prefer the current choice). > > > > Or am I missing your point? > > So lots of little confusions added up to complete fail :-{ > > Mostly I think it was the UNLOCK x + LOCK x are fully ordered (where I > forgot: but not against uninvolved CPUs) and RELEASE/ACQUIRE are > transitive (where I forgot: RELEASE/ACQUIRE _chains_ are transitive, but > again not against uninvolved CPUs). > > Which leads me to think I would like to suggest alternative rules for > RELEASE/ACQUIRE (to replace those Will suggested; as I think those are > partly responsible for my confusion). Yeah, sorry. I originally used the phrase "fully ordered" but changed it to "full barrier", which has stronger transitivity (newly understood definition) requirements that I didn't intend. RELEASE -> ACQUIRE should be used for message passing between two CPUs and not have ordering effects on other observers unless they're part of the RELEASE -> ACQUIRE chain. > - RELEASE -> ACQUIRE is fully ordered (but not a full barrier) when > they operate on the same variable and the ACQUIRE reads from the > RELEASE. Notable, RELEASE/ACQUIRE are RCpc and lack transitivity. Are we explicit about the difference between "fully ordered" and "full barrier" somewhere else, because this looks like it will confuse people. > - RELEASE -> ACQUIRE can be upgraded to a full barrier (including > transitivity) using smp_mb__release_acquire(), either before RELEASE > or after ACQUIRE (but consistently [*]). Hmm, but we don't actually need this for RELEASE -> ACQUIRE, afaict. This is just needed for UNLOCK -> LOCK, and is exactly what RCU is currently using (for PPC only). Stepping back a second, I believe that there are three cases: RELEASE X -> ACQUIRE Y (same CPU) * Needs a barrier on TSO architectures for full ordering UNLOCK X -> LOCK Y (same CPU) * Needs a barrier on PPC for full ordering RELEASE X -> ACQUIRE X (different CPUs) UNLOCK X -> ACQUIRE X (different CPUs) * Fully ordered everywhere... * ... but needs a barrier on PPC to become a full barrier so maybe it makes more sense to split out the local and inter-cpu ordering with something like: smp_mb__after_release_acquire() smp_mb__after_release_acquire_local() then the first one directly replaces smp_mb__after_unlock_lock, and is only defined for PPC, whereas the second one is also defined for TSO archs. > - RELEASE -> ACQUIRE _chains_ (on shared variables) preserve causality, > (because each link is fully ordered) but are not transitive. Yup, and that's the same for UNLOCK -> LOCK, too. > And I think that in the past few weeks we've been using transitive > ambiguously, the definition we have in Documentation/memory-barriers.txt > is a _strong_ transitivity, where we can make guarantees about CPUs not > directly involved. > > What we have here (due to RCpc) is a weak form of transitivity, which, > while it preserves the natural concept of causality, does not extend to > other CPUs. > > So we could go around and call them 'strong' and 'weak' transitivity, > but I suspect its easier for everyone involved if we come up with > separate terms (less room for error if we accidentally omit the > 'strong/weak' qualifier). Surely the general case is message passing and so "transitivity" should just refer to chains of RELEASE -> ACQUIRE? Then "strong transitivity" could refer to the far more complicated (imo) case that is synonymous with "full barrier". Will -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html