On Thu, May 19, 2016 at 10:52:19AM +0100, David Howells wrote: > Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > Does this generate 'sane' code for LL/SC archs? That is, a single LL/SC > > loop and not a loop around an LL/SC cmpxchg. > > Depends on your definition of 'sane'. The code will work - but it's not > necessarily the most optimal. gcc currently keeps the __atomic_load_n() and > the fudging in the middle separate from the __atomic_compare_exchange_n(). > > So on aarch64: > > static __always_inline int __atomic_add_unless(atomic_t *v, > int addend, int unless) > { > int cur = __atomic_load_n(&v->counter, __ATOMIC_RELAXED); > int new; > > do { > if (__builtin_expect(cur == unless, 0)) > break; > new = cur + addend; > } while (!__atomic_compare_exchange_n(&v->counter, > &cur, new, > false, > __ATOMIC_SEQ_CST, > __ATOMIC_RELAXED)); > return cur; > } > > int test_atomic_add_unless(atomic_t *counter) > { > return __atomic_add_unless(counter, 0x56, 0x23); > } [...] > I think the code it generates should look something like: > > test_atomic_add_unless: > .L7: > ldaxr w1, [x0] # __atomic_load_n() > cmp w1, 35 # } if (cur == unless) > beq .L4 # } break > add w2, w1, 86 # new = cur + addend > stlxr w4, w2, [x0] > cbnz w4, .L7 > .L4: > mov w1, w0 > ret > > but that requires the compiler to split up the LDAXR and STLXR instructions > and render arbitrary code between. I suspect that might be quite a stretch. ... it's also weaker than the requirements of the kernel memory model. See 8e86f0b409a4 ("arm64: atomics: fix use of acquire + release for full barrier semantics") for the gory details. Will -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html