[PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()

Alexander A Sverdlin <alexander.sverdlin@xxxxxxxxx> · Wed, 27 Jan 2021 21:36:22 +0100

From: Alexander Sverdlin <alexander.sverdlin@xxxxxxxxx>

On Octeon smp_mb() translates to SYNC while wmb+rmb translates to SYNCW
only. This brings around 10% performance on tight uncontended spinlock
loops.

Refer to commit 500c2e1fdbcc ("MIPS: Optimize spinlocks.") and the link
below.

On 6-core Octeon machine:
sysbench --test=mutex --num-threads=64 --memory-scope=local run

w/o patch:	1.60s
with patch:	1.51s

Link: https://lore.kernel.org/lkml/5644D08D.4080206@xxxxxxxxxxxxxxxxxx/
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@xxxxxxxxx>
---
 arch/mips/include/asm/barrier.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 49ff172..24c3f2c 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -113,6 +113,15 @@ static inline void wmb(void)
 					    ".set arch=octeon\n\t"	\
 					    "syncw\n\t"			\
 					    ".set pop" : : : "memory")
+
+#define __smp_store_release(p, v)					\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	__smp_wmb();							\
+	__smp_rmb();							\
+	WRITE_ONCE(*p, v);						\
+} while (0)
+
 #else
 #define smp_mb__before_llsc() smp_llsc_mb()
 #define __smp_mb__before_llsc() smp_llsc_mb()
-- 
2.10.2