On Wed, Apr 06, 2022 at 08:54:36PM +0800, guoren@xxxxxxxxxx wrote: > From: Guo Ren <guoren@xxxxxxxxxxxxxxxxx> > > The generic atomic.h used cmpxchg to implement the atomic > operations, it will cause daul loop to reduce the forward > guarantee. The patch implement csky custom atomic operations with > ldex/stex instructions for the best performance. > > Signed-off-by: Guo Ren <guoren@xxxxxxxxxxxxxxxxx> > Signed-off-by: Guo Ren <guoren@xxxxxxxxxx> > --- > arch/csky/include/asm/atomic.h | 251 +++++++++++++++++++++++++++++++++ > 1 file changed, 251 insertions(+) > create mode 100644 arch/csky/include/asm/atomic.h > +static __always_inline \ > +int arch_atomic_fetch_##op(int i, atomic_t *v) \ > +{ \ > + register int ret, tmp; \ > + __asm__ __volatile__ ( \ > + "1: ldex.w %0, (%3) \n" \ > + ACQUIRE_FENCE \ > + " mov %1, %0 \n" \ > + " " #op " %0, %2 \n" \ > + RELEASE_FENCE \ > + " stex.w %0, (%3) \n" \ > + " bez %0, 1b \n" \ > + : "=&r" (tmp), "=&r" (ret) \ > + : "r" (I), "r"(&v->counter) \ > + : "memory"); \ > + return ret; \ > +} I believe this suffers the problem described in: 8e86f0b409a44193 ("arm64: atomics: fix use of acquire + release for full barrier semantics") ... and does not provide FULL ordering semantics. Thanks, Mark.