From: Alexander Sverdlin <alexander.sverdlin@xxxxxxxxx> The switch to qspinlock brought a massive regression in spinlocks on Octeon. Even after applying this series (and a patch in the ARCH-independent code [1]) tight contended (6 cores, 1 thread per core) spinlock loop is still 50% slower as previous ticket-based implementation. This series implements some optimizations and has been tested on a 6-core Octeon machine. [1] Link: https://lkml.org/lkml/2021/1/27/1137 Alexander Sverdlin (6): MIPS: Octeon: Implement __smp_store_release() MIPS: Implement atomic_cmpxchg_relaxed() MIPS: Octeon: qspinlock: Flush write buffer MIPS: Octeon: qspinlock: Exclude mmiowb() MIPS: Provide {atomic_}xchg_relaxed() MIPS: cmpxchg: Use cmpxchg_local() for {cmp_}xchg_small() arch/mips/include/asm/atomic.h | 5 +++++ arch/mips/include/asm/barrier.h | 9 +++++++++ arch/mips/include/asm/cmpxchg.h | 6 ++++++ arch/mips/include/asm/spinlock.h | 5 +++++ arch/mips/kernel/cmpxchg.c | 4 ++-- 5 files changed, 27 insertions(+), 2 deletions(-) -- 2.10.2