On Fri, Nov 22, 2013 at 07:51:07PM +0100, Peter Zijlstra wrote: > On Fri, Nov 22, 2013 at 10:26:32AM -0800, Paul E. McKenney wrote: > > The real source of my cognitive pain is that here we have a sequence of > > code that has neither atomic instructions or memory-barrier instructions, > > but it looks like it still manages to act as a full memory barrier. > > Still not quite sure I should trust it... > > Yes, this is something that puzzles me too. > > That said, the two rules that: > > 1) stores aren't re-ordered against other stores > 2) reads aren't re-ordered against other reads > > Do make that: > > STORE x > LOAD x > > form a fence that neither stores nor loads can pass through from > either side; note however that they themselves rely on the data > dependency to not reorder against themselves. > > If you put them the other way around: > > LOAD x > STORE y > > we seem to get a stronger variant because stores are not re-ordered > against older reads. > > There is however the exception cause for rule 1) above, which includes > clflush, non-temporal stores and string ops; the actual mfence > instruction doesn't seem to have this exception and would thus be > slightly stronger still. > > Still confusion situation all round. I think this means x86 needs help too. Consider: x = y = 0 w[x] = 1 | w[y] = 1 mfence | mfence r[y] = 0 | r[x] = 0 This is generally an impossible case, right? (Since if we observe y=0 this means that w[y]=1 has not yet happened, and therefore x=1, and vice-versa). Now replace one of the mfences with smp_store_release(l1); smp_load_acquire(l2); such that we have a RELEASE+ACQUIRE pair that _should_ form a full barrier: w[x] = 1 | w[y] = 1 w[l1] = 1 | mfence r[l2] = 0 | r[x] = 0 r[y] = 0 | At which point we can observe the impossible, because as per the rule: 'reads may be reordered with older writes to different locations' Our r[y] can slip before the w[x]=1. Thus x86's smp_store_release() would need to be: +#define smp_store_release(p, v) \ +do { \ + compiletime_assert_atomic_type(*p); \ + smp_mb(); \ + ACCESS_ONCE(*p) = (v); \ +} while (0) Or: (void)xchg((p), (v)); Idem for s390 and sparc I suppose. The only reason your example worked is because the unlock and lock were for the same lock. This of course leaves us without joy for circular buffers, which can do without this LOCK'ed op and without sync on PPC. Now I'm not at all sure we've got enough of those to justify primitives just for them. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>