On Sat, Nov 23, 2013 at 1:29 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > I haven't read your email in full detail yet, but one thing I did miss > was cache-snoops. > > One of the reasons for failing transitive / multi-copy atomicity is that > CPUs might have different views of the memory state depending on from > which caches they can get snoops. My argument doesn't depend on that or care about that. My argument depends purely on: - barriers have a certain sane local meaning (because I don't think they can work without that) - a cache coherenct implies that actually changing the cacheline requires exclusive access to that cacheline at some point (where the actual time of that "at some point" is not actually all that important and you can hold on to the exclusive access for some arbitrary time, but the particular points where the exclusive state changes matters for the barrier semantics in order for barriers to work). Nothing else matters for my argument. So cache snooping details are irrelevant. Or rather, it is relevant only in the sense that the CPU's that participate in the cache coherency protocol then have to have barriers that work properly in the presence of said snooping. You can't allow snooping to "break" the barriers. Now, as I said in my follow-up, I think one "explanation" might be that "everyting happens at the same time" approach, and while that actually may work for "lwsync" and the sequence in question, I'm not convinced that kind of lock really is a proper lock. Because if you accept the "everything happens at once" model to explain why the unlock+lock sequence doesn't act as a memory barrier, than I actually think that you can build up an argument where multiple concurrent spinlock'ed accesses (ie you make one of 'X'/'Y' be the queue entry for the *next* MCS lock waiter trying to acquire it) can get insane results that aren't consistent with actual exclusion. Because if you accept that "a unlock and getting that lock on another CPU can happen at the same time" argument (so that you can have that circular chain of mutual dependencies), then you can extend the chain further, since all the lock/unlock operations in question apparently have zero latency and can thus see previous values for the same reason CPU2 can see the original zero value of "X". So I think anything that allows that A <= B <= C <= D <=E <= F <= A situation is not necessarily a valid locking model (because you could basically add a "lwsync + ACCESS_ONCE(lock-chain)=0" to the work CPU1 does, and since we've already established that there are zero latencies in all this, CPU2 could have gotten *that* lock that we now released "at the same time" as reading X as zero, even though CPU0 set X to one in its "critical region". So locks don't just imply "you can't let anything out of the critical region". They also imply exclusion, and that "everything happens all at once" model would seem to literally allow *EVERYTHING* to happen at once, breaking the *exclusion* requirement. But I dunno. My gut feel is that the "everything happens at once" explanation is not actually then a valid model for locking, which in turn would mean that using "lwsync" in both unlock and lock is not sufficient. Stated another way, let's say that you have multiple CPU's doing this: lock if (x == 0) x = 1; unlock then we had *better* have only one of the CPU's actually set "x=1". Otherwise the lock isn't a lock. Agreed? Paul argued that "lwsync" is valid in both lock and unlock becuse it doesn't allow anything to leak out of the locked region, but I'm arguing that if it means that we allow that "everything happens at once" model, then *every* CPU can do that "if (x == 0) x = 1" logic all at the same time, *all* of them decide to set x to 1, and *none* of them "leak" their accesses outside their locked region (they'd all set 'x' to 1 at the same time), but the end result is still wrong. So locking is not just "accesses inside of the lock cannot leak outside the lock". It also implies "accesses inside of it have to be _ordered_ wrt the lock", and that in turn disallows the "A <= .. <= F <= A" model. One of the "<=" has to be a "<" for the lock to be a lock, methinks. But hey, maybe somebody can point to where I screwed up. I just do not think "snooping" is it. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>