Hi, Peter, On Tue, Jul 27, 2021 at 7:06 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Tue, Jul 27, 2021 at 10:29:59AM +0800, Boqun Feng wrote: > > > > "How to implement xchg_tail" shouldn't force with _Q_PENDING_BITS, but > > > the arch could choose. > > > > I actually agree with this part, but this patchset failed to provide > > enough evidences on why we should choose xchg_tail() implementation > > based on whether hardware has xchg16, more precisely, for an archtecture > > which doesn't have a hardware xchg16, why cmpxchg emulated xchg16() is > > worse than a "load+cmpxchg) implemeneted xchg_tail()? If it's a > > performance reason, please show some numbers. > > Right. Their problem is their broken xchg16() implementation. Please correct me if I'm wrong. Now my understanding is like this: 1, _Q_PENDING_BITS=1 qspinlock can be used by all archs, though it may be not optimized. 2, _Q_PENDING_BITS=8 qspinlock can be used if hardware supports sub-word xchg/cmpxchg, or the software emulation is correctly implemented. But the current MIPS emulation is not correct. If so, I want to rename ARCH_HAS_HW_XCHG_SMALL to ARCH_HAS_FAST_XCHG_SMALL, and let these archs select it: 1, X86, ARM, ARM64, IA64, M68K, because they have hardware support. 2, Other archs who select qspinlock currently (including MIPS), because their current behavior is use _Q_PENDING_BITS=8 qspinlock and we don't want to change anything in this patch. If their emulation is broken or not as "fast" as expected, we can make new patches to unselect the ARCH_HAS_FAST_XCHG_SMALL option. Huacai