On Wed, Jan 27, 2021 at 09:36:22PM +0100, Alexander A Sverdlin wrote: > From: Alexander Sverdlin <alexander.sverdlin@xxxxxxxxx> > > On Octeon smp_mb() translates to SYNC while wmb+rmb translates to SYNCW > only. This brings around 10% performance on tight uncontended spinlock > loops. > > Refer to commit 500c2e1fdbcc ("MIPS: Optimize spinlocks.") and the link > below. > > On 6-core Octeon machine: > sysbench --test=mutex --num-threads=64 --memory-scope=local run > > w/o patch: 1.60s > with patch: 1.51s > > Link: https://lore.kernel.org/lkml/5644D08D.4080206@xxxxxxxxxxxxxxxxxx/ > Signed-off-by: Alexander Sverdlin <alexander.sverdlin@xxxxxxxxx> > --- > arch/mips/include/asm/barrier.h | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h > index 49ff172..24c3f2c 100644 > --- a/arch/mips/include/asm/barrier.h > +++ b/arch/mips/include/asm/barrier.h > @@ -113,6 +113,15 @@ static inline void wmb(void) > ".set arch=octeon\n\t" \ > "syncw\n\t" \ > ".set pop" : : : "memory") > + > +#define __smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + __smp_wmb(); \ > + __smp_rmb(); \ > + WRITE_ONCE(*p, v); \ > +} while (0) This is wrong in general since smp_rmb() will only provide order between two loads and smp_store_release() is a store. If this is correct for all MIPS, this needs a giant comment on exactly how that smp_rmb() makes sense here.