On Sun, Mar 28, 2021 at 2:43 AM Waiman Long <longman@xxxxxxxxxx> wrote: > > On 3/27/21 2:06 PM, guoren@xxxxxxxxxx wrote: > > From: Guo Ren <guoren@xxxxxxxxxxxxxxxxx> > > > > Some architectures don't have sub-word swap atomic instruction, > > they only have the full word's one. > > > > The sub-word swap only improve the performance when: > > NR_CPUS < 16K > > * 0- 7: locked byte > > * 8: pending > > * 9-15: not used > > * 16-17: tail index > > * 18-31: tail cpu (+1) > > > > The 9-15 bits are wasted to use xchg16 in xchg_tail. > > > > Please let architecture select xchg16/xchg32 to implement > > xchg_tail. > > > > Signed-off-by: Guo Ren <guoren@xxxxxxxxxxxxxxxxx> > > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > > Cc: Will Deacon <will@xxxxxxxxxx> > > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > > Cc: Waiman Long <longman@xxxxxxxxxx> > > Cc: Arnd Bergmann <arnd@xxxxxxxx> > > Cc: Anup Patel <anup@xxxxxxxxxxxxxx> > > --- > > kernel/Kconfig.locks | 3 +++ > > kernel/locking/qspinlock.c | 44 +++++++++++++++++++++----------------- > > 2 files changed, 27 insertions(+), 20 deletions(-) > > > > diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks > > index 3de8fd11873b..d02f1261f73f 100644 > > --- a/kernel/Kconfig.locks > > +++ b/kernel/Kconfig.locks > > @@ -239,6 +239,9 @@ config LOCK_SPIN_ON_OWNER > > config ARCH_USE_QUEUED_SPINLOCKS > > bool > > > > +config ARCH_USE_QUEUED_SPINLOCKS_XCHG32 > > + bool > > + > > config QUEUED_SPINLOCKS > > def_bool y if ARCH_USE_QUEUED_SPINLOCKS > > depends on SMP > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c > > index cbff6ba53d56..54de0632c6a8 100644 > > --- a/kernel/locking/qspinlock.c > > +++ b/kernel/locking/qspinlock.c > > @@ -163,26 +163,6 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock) > > WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL); > > } > > > > -/* > > - * xchg_tail - Put in the new queue tail code word & retrieve previous one > > - * @lock : Pointer to queued spinlock structure > > - * @tail : The new queue tail code word > > - * Return: The previous queue tail code word > > - * > > - * xchg(lock, tail), which heads an address dependency > > - * > > - * p,*,* -> n,*,* ; prev = xchg(lock, node) > > - */ > > -static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) > > -{ > > - /* > > - * We can use relaxed semantics since the caller ensures that the > > - * MCS node is properly initialized before updating the tail. > > - */ > > - return (u32)xchg_relaxed(&lock->tail, > > - tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET; > > -} > > - > > #else /* _Q_PENDING_BITS == 8 */ > > > > /** > > @@ -206,6 +186,30 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock) > > { > > atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val); > > } > > +#endif > > + > > +#if _Q_PENDING_BITS == 8 && !defined(CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32) > > +/* > > + * xchg_tail - Put in the new queue tail code word & retrieve previous one > > + * @lock : Pointer to queued spinlock structure > > + * @tail : The new queue tail code word > > + * Return: The previous queue tail code word > > + * > > + * xchg(lock, tail), which heads an address dependency > > + * > > + * p,*,* -> n,*,* ; prev = xchg(lock, node) > > + */ > > +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) > > +{ > > + /* > > + * We can use relaxed semantics since the caller ensures that the > > + * MCS node is properly initialized before updating the tail. > > + */ > > + return (u32)xchg_relaxed(&lock->tail, > > + tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET; > > +} > > + > > +#else > > > > /** > > * xchg_tail - Put in the new queue tail code word & retrieve previous one > > I don't have any problem adding a > CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32 config option to control that. Thx > > One minor nit: > > #endif /* _Q_PENDING_BITS == 8 */ > > You should probably remove the comment at the trailing end of the > corresponding "#endif" as it is now wrong. I'll fix it in next patch -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/