On Mon, Oct 19, 2015 at 09:17:18AM +0800, Boqun Feng wrote: > On Fri, Oct 09, 2015 at 10:40:39AM +0100, Will Deacon wrote: > > On Fri, Oct 09, 2015 at 10:31:38AM +0200, Peter Zijlstra wrote: > [snip] > > > > > > So lots of little confusions added up to complete fail :-{ > > > > > > Mostly I think it was the UNLOCK x + LOCK x are fully ordered (where I > > > forgot: but not against uninvolved CPUs) and RELEASE/ACQUIRE are > > > transitive (where I forgot: RELEASE/ACQUIRE _chains_ are transitive, but > > > again not against uninvolved CPUs). > > > > > > Which leads me to think I would like to suggest alternative rules for > > > RELEASE/ACQUIRE (to replace those Will suggested; as I think those are > > > partly responsible for my confusion). > > > > Yeah, sorry. I originally used the phrase "fully ordered" but changed it > > to "full barrier", which has stronger transitivity (newly understood > > definition) requirements that I didn't intend. > > > > RELEASE -> ACQUIRE should be used for message passing between two CPUs > > and not have ordering effects on other observers unless they're part of > > the RELEASE -> ACQUIRE chain. > > > > > - RELEASE -> ACQUIRE is fully ordered (but not a full barrier) when > > > they operate on the same variable and the ACQUIRE reads from the > > > RELEASE. Notable, RELEASE/ACQUIRE are RCpc and lack transitivity. > > > > Are we explicit about the difference between "fully ordered" and "full > > barrier" somewhere else, because this looks like it will confuse people. > > > > This is confusing me right now. ;-) > > Let's use a simple example for only one primitive, as I understand it, > if we say a primitive A is "fully ordered", we actually mean: > > 1. The memory operations preceding(in program order) A can't be > reordered after the memory operations following(in PO) A. > > and > > 2. The memory operation(s) in A can't be reordered before the > memory operations preceding(in PO) A and after the memory > operations following(in PO) A. > > If we say A is a "full barrier", we actually means: > > 1. The memory operations preceding(in program order) A can't be > reordered after the memory operations following(in PO) A. > > and > > 2. The memory ordering guarantee in #1 is visible globally. > > Is that correct? Or "full barrier" is more strong than I understand, > i.e. there is a third property of "full barrier": > > 3. The memory operation(s) in A can't be reordered before the > memory operations preceding(in PO) A and after the memory > operations following(in PO) A. > > IOW, is "full barrier" a more strong version of "fully ordered" or not? There is also the question of whether the barrier forces ordering of unrelated stores, everything initially zero and all accesses READ_ONCE() or WRITE_ONCE(): P0 P1 P2 P3 X = 1; Y = 1; r1 = X; r3 = Y; some_barrier(); some_barrier(); r2 = Y; r4 = X; P2's and P3's ordering could be globally visible without requiring P0's and P1's independent stores to be ordered, for example, if you used smp_rmb() for some_barrier(). In contrast, if we used smp_mb() for barrier, everyone would agree on the order of P0's and P0's stores. There are actually a fair number of different combinations of aspects of memory ordering. We will need to choose wisely. ;-) My hope is that the store-ordering gets folded into the globally visible transitive level. Especially given that I have not (yet) seen any algorithms used in production that relied on the ordering of independent stores. Thanx, Paul > Regards, > Boqun > > > > - RELEASE -> ACQUIRE can be upgraded to a full barrier (including > > > transitivity) using smp_mb__release_acquire(), either before RELEASE > > > or after ACQUIRE (but consistently [*]). > > > > Hmm, but we don't actually need this for RELEASE -> ACQUIRE, afaict. This > > is just needed for UNLOCK -> LOCK, and is exactly what RCU is currently > > using (for PPC only). > > > > Stepping back a second, I believe that there are three cases: > > > > > > RELEASE X -> ACQUIRE Y (same CPU) > > * Needs a barrier on TSO architectures for full ordering > > > > UNLOCK X -> LOCK Y (same CPU) > > * Needs a barrier on PPC for full ordering > > > > RELEASE X -> ACQUIRE X (different CPUs) > > UNLOCK X -> ACQUIRE X (different CPUs) > > * Fully ordered everywhere... > > * ... but needs a barrier on PPC to become a full barrier > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html