On Tue, Nov 24, 2020 at 3:39 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > On Tue, Nov 24, 2020 at 01:43:54PM +0000, guoren@xxxxxxxxxx wrote: > > diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild > > + if (align) { \ > > + __asm__ __volatile__ ( \ > > + "0: lr.w %0, 0(%z4)\n" \ > > + " move %1, %0\n" \ > > + " slli %1, %1, 16\n" \ > > + " srli %1, %1, 16\n" \ > > + " move %2, %z3\n" \ > > + " slli %2, %2, 16\n" \ > > + " or %1, %2, %1\n" \ > > + " sc.w %2, %1, 0(%z4)\n" \ > > + " bnez %2, 0b\n" \ > > + " srli %0, %0, 16\n" \ > > + : "=&r" (__ret), "=&r" (tmp), "=&r" (__rc) \ > > + : "rJ" (__new), "rJ"(addr) \ > > + : "memory"); \ > > I'm pretty sure there's a handfull of implementations like this out > there... if only we could share. Isn't this effectively the same as the "_Q_PENDING_BITS != 8" version of xchg_tail()? If nothing else needs xchg() on a 16-bit value, maybe changing the #ifdef in the qspinlock code is enough. Only around half the architectures actually implement 8-bit and 16-bit cmpxchg() and xchg(), it might even be worth trying to eventually change the interface to not do it at all, but instead have explicit cmpxchg8() and cmpxchg16() helpers for the few files that do use them. Arnd