On Tue, Nov 24, 2020 at 11:00 PM Arnd Bergmann <arnd@xxxxxxxxxx> wrote: > > On Tue, Nov 24, 2020 at 3:39 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Tue, Nov 24, 2020 at 01:43:54PM +0000, guoren@xxxxxxxxxx wrote: > > > diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild > > > > + if (align) { \ > > > + __asm__ __volatile__ ( \ > > > + "0: lr.w %0, 0(%z4)\n" \ > > > + " move %1, %0\n" \ > > > + " slli %1, %1, 16\n" \ > > > + " srli %1, %1, 16\n" \ > > > + " move %2, %z3\n" \ > > > + " slli %2, %2, 16\n" \ > > > + " or %1, %2, %1\n" \ > > > + " sc.w %2, %1, 0(%z4)\n" \ > > > + " bnez %2, 0b\n" \ > > > + " srli %0, %0, 16\n" \ > > > + : "=&r" (__ret), "=&r" (tmp), "=&r" (__rc) \ > > > + : "rJ" (__new), "rJ"(addr) \ > > > + : "memory"); \ > > > > I'm pretty sure there's a handfull of implementations like this out > > there... if only we could share. > > Isn't this effectively the same as the "_Q_PENDING_BITS != 8" > version of xchg_tail()? This can be concluded as the different effectiveness between cmpxchg and xchg. For the arch which only has lr/sc instructions, the cmpxchg & xchg are similar. #if _Q_PENDING_BITS == 8 static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) { /* * We can use relaxed semantics since the caller ensures that the * MCS node is properly initialized before updating the tail. */ return (u32)xchg_relaxed(&lock->tail, tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET; } #else /* _Q_PENDING_BITS == 8 */ static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) { u32 old, new, val = atomic_read(&lock->val); for (;;) { new = (val & _Q_LOCKED_PENDING_MASK) | tail; /* * We can use relaxed semantics since the caller ensures that * the MCS node is properly initialized before updating the * tail. */ old = atomic_cmpxchg_relaxed(&lock->val, val, new); if (old == val) break; val = old; } return old; } #endif /* _Q_PENDING_BITS == 8 */ > > If nothing else needs xchg() on a 16-bit value, maybe > changing the #ifdef in the qspinlock code is enough. > > Only around half the architectures actually implement 8-bit > and 16-bit cmpxchg() and xchg(), it might even be worth trying > to eventually change the interface to not do it at all, but > instead have explicit cmpxchg8() and cmpxchg16() helpers > for the few files that do use them. > > Arnd -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/