On Mon, Mar 20, 2017 at 07:51:01AM -0700, Eric Dumazet wrote: > PowerPC has no efficient atomic_inc() and this definitely shows on > network intensive workloads involving concurrent cores/threads. Correct, PPC LL/SC are dreadfully expensive. > atomic_cmpxchg() on PowerPC is horribly more expensive because of the > added two SYNC instructions. Note that refcount_t uses atomic_cmpxchg_release() and atomic_cmpxchg_relaxed() which avoid most of the painful barriers.