[Grr, resending as text/plain; I have no idea what inspired Thunderbird to send this as multipart/mixed with HTML.] On 4/22/2016 5:04 AM, Peter Zijlstra wrote:
Implement FETCH-OP atomic primitives, these are very similar to the existing OP-RETURN primitives we already have, except they return the value of the atomic variable_before_ modification. This is especially useful for irreversible operations -- such as bitops (because it becomes impossible to reconstruct the state prior to modification). XXX please look at the tilegx (CONFIG_64BIT) atomics, I think we get the barriers wrong (at the very least they're inconsistent). Signed-off-by: Peter Zijlstra (Intel)<peterz@xxxxxxxxxxxxx> --- arch/tile/include/asm/atomic.h | 4 + arch/tile/include/asm/atomic_32.h | 60 +++++++++++++------ arch/tile/include/asm/atomic_64.h | 117 +++++++++++++++++++++++++------------- arch/tile/include/asm/bitops_32.h | 18 ++--- arch/tile/lib/atomic_32.c | 42 ++++++------- arch/tile/lib/atomic_asm_32.S | 14 ++-- 6 files changed, 161 insertions(+), 94 deletions(-) [...] static inline int atomic_add_return(int i, atomic_t *v) { int val; smp_mb(); /* barrier for proper semantics */ val = __insn_fetchadd4((void *)&v->counter, i) + i; barrier(); /* the "+ i" above will wait on memory */ + /* XXX smp_mb() instead, as per cmpxchg() ? */ return val; }
The existing code is subtle but I'm pretty sure it's not a bug. The tilegx architecture will take the "+ i" and generate an add instruction. The compiler barrier will make sure the add instruction happens before anything else that could touch memory, and the microarchitecture will make sure that the result of the atomic fetchadd has been returned to the core before any further instructions are issued. (The memory architecture is lazy, but when you feed a load through an arithmetic operation, we block issuing any further instructions until the add's operands are available.) This would not be an adequate memory barrier in general, since other loads or stores might still be in flight, even if the "val" operand had made it from memory to the core at this point. However, we have issued no other stores or loads since the previous memory barrier, so we know that there can be no other loads or stores in flight, and thus the compiler barrier plus arithmetic op is equivalent to a memory barrier here. In hindsight, perhaps a more substantial comment would have been helpful here. Unless you see something missing in my analysis, I'll plan to go ahead and add a suitable comment here :-) Otherwise, though just based on code inspection so far: Acked-by: Chris Metcalf<cmetcalf@xxxxxxxxxxxx> [for tile] -- Chris Metcalf, Mellanox Technologies http://www.mellanox.com -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html