On Thu, Oct 08, 2015 at 02:44:39PM -0700, Paul E. McKenney wrote: > On Thu, Oct 08, 2015 at 01:16:38PM +0200, Peter Zijlstra wrote: > > On Thu, Oct 08, 2015 at 02:50:36PM +1100, Michael Ellerman wrote: > > > On Wed, 2015-10-07 at 08:25 -0700, Paul E. McKenney wrote: > > > > > > Currently, we do need smp_mb__after_unlock_lock() to be after the > > > > acquisition on PPC -- putting it between the unlock and the lock > > > > of course doesn't cut it for the cross-thread unlock/lock case. > > > > This ^, that makes me think I don't understand > > smp_mb__after_unlock_lock. > > > > How is: > > > > UNLOCK x > > smp_mb__after_unlock_lock() > > LOCK y > > > > a problem? That's still a full barrier. > > The problem is that I need smp_mb__after_unlock_lock() to give me > transitivity even if the UNLOCK happened on one CPU and the LOCK > on another. For that to work, the smp_mb__after_unlock_lock() needs > to be either immediately after the acquire (the current choice) or > immediately before the release (which would also work from a purely > technical viewpoint, but I much prefer the current choice). > > Or am I missing your point? So lots of little confusions added up to complete fail :-{ Mostly I think it was the UNLOCK x + LOCK x are fully ordered (where I forgot: but not against uninvolved CPUs) and RELEASE/ACQUIRE are transitive (where I forgot: RELEASE/ACQUIRE _chains_ are transitive, but again not against uninvolved CPUs). Which leads me to think I would like to suggest alternative rules for RELEASE/ACQUIRE (to replace those Will suggested; as I think those are partly responsible for my confusion). - RELEASE -> ACQUIRE is fully ordered (but not a full barrier) when they operate on the same variable and the ACQUIRE reads from the RELEASE. Notable, RELEASE/ACQUIRE are RCpc and lack transitivity. - RELEASE -> ACQUIRE can be upgraded to a full barrier (including transitivity) using smp_mb__release_acquire(), either before RELEASE or after ACQUIRE (but consistently [*]). - RELEASE -> ACQUIRE _chains_ (on shared variables) preserve causality, (because each link is fully ordered) but are not transitive. And I think that in the past few weeks we've been using transitive ambiguously, the definition we have in Documentation/memory-barriers.txt is a _strong_ transitivity, where we can make guarantees about CPUs not directly involved. What we have here (due to RCpc) is a weak form of transitivity, which, while it preserves the natural concept of causality, does not extend to other CPUs. So we could go around and call them 'strong' and 'weak' transitivity, but I suspect its easier for everyone involved if we come up with separate terms (less room for error if we accidentally omit the 'strong/weak' qualifier). [*] Do we want to take that choice away and go for: smp_mb__after_release_acquire() ? -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html