On Fri, Oct 09, 2015 at 10:40:39AM +0100, Will Deacon wrote: > On Fri, Oct 09, 2015 at 10:31:38AM +0200, Peter Zijlstra wrote: [snip] > > > > So lots of little confusions added up to complete fail :-{ > > > > Mostly I think it was the UNLOCK x + LOCK x are fully ordered (where I > > forgot: but not against uninvolved CPUs) and RELEASE/ACQUIRE are > > transitive (where I forgot: RELEASE/ACQUIRE _chains_ are transitive, but > > again not against uninvolved CPUs). > > > > Which leads me to think I would like to suggest alternative rules for > > RELEASE/ACQUIRE (to replace those Will suggested; as I think those are > > partly responsible for my confusion). > > Yeah, sorry. I originally used the phrase "fully ordered" but changed it > to "full barrier", which has stronger transitivity (newly understood > definition) requirements that I didn't intend. > > RELEASE -> ACQUIRE should be used for message passing between two CPUs > and not have ordering effects on other observers unless they're part of > the RELEASE -> ACQUIRE chain. > > > - RELEASE -> ACQUIRE is fully ordered (but not a full barrier) when > > they operate on the same variable and the ACQUIRE reads from the > > RELEASE. Notable, RELEASE/ACQUIRE are RCpc and lack transitivity. > > Are we explicit about the difference between "fully ordered" and "full > barrier" somewhere else, because this looks like it will confuse people. > This is confusing me right now. ;-) Let's use a simple example for only one primitive, as I understand it, if we say a primitive A is "fully ordered", we actually mean: 1. The memory operations preceding(in program order) A can't be reordered after the memory operations following(in PO) A. and 2. The memory operation(s) in A can't be reordered before the memory operations preceding(in PO) A and after the memory operations following(in PO) A. If we say A is a "full barrier", we actually means: 1. The memory operations preceding(in program order) A can't be reordered after the memory operations following(in PO) A. and 2. The memory ordering guarantee in #1 is visible globally. Is that correct? Or "full barrier" is more strong than I understand, i.e. there is a third property of "full barrier": 3. The memory operation(s) in A can't be reordered before the memory operations preceding(in PO) A and after the memory operations following(in PO) A. IOW, is "full barrier" a more strong version of "fully ordered" or not? Regards, Boqun > > - RELEASE -> ACQUIRE can be upgraded to a full barrier (including > > transitivity) using smp_mb__release_acquire(), either before RELEASE > > or after ACQUIRE (but consistently [*]). > > Hmm, but we don't actually need this for RELEASE -> ACQUIRE, afaict. This > is just needed for UNLOCK -> LOCK, and is exactly what RCU is currently > using (for PPC only). > > Stepping back a second, I believe that there are three cases: > > > RELEASE X -> ACQUIRE Y (same CPU) > * Needs a barrier on TSO architectures for full ordering > > UNLOCK X -> LOCK Y (same CPU) > * Needs a barrier on PPC for full ordering > > RELEASE X -> ACQUIRE X (different CPUs) > UNLOCK X -> ACQUIRE X (different CPUs) > * Fully ordered everywhere... > * ... but needs a barrier on PPC to become a full barrier > >
Attachment:
signature.asc
Description: PGP signature