Re: [PATCH RFC 1/2] arch: Introduce ARCH_HAS_HW_XCHG_SMALL

Huacai Chen <chenhuacai@xxxxxxxxx> · Wed, 28 Jul 2021 18:40:54 +0800

Hi, Peter,

On Tue, Jul 27, 2021 at 7:06 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Tue, Jul 27, 2021 at 10:29:59AM +0800, Boqun Feng wrote:
>
> > > "How to implement xchg_tail" shouldn't force with _Q_PENDING_BITS, but
> > > the arch could choose.
> >
> > I actually agree with this part, but this patchset failed to provide
> > enough evidences on why we should choose xchg_tail() implementation
> > based on whether hardware has xchg16, more precisely, for an archtecture
> > which doesn't have a hardware xchg16, why cmpxchg emulated xchg16() is
> > worse than a "load+cmpxchg) implemeneted xchg_tail()? If it's a
> > performance reason, please show some numbers.
>
> Right. Their problem is their broken xchg16() implementation.

Please correct me if I'm wrong. Now my understanding is like this:
1, _Q_PENDING_BITS=1 qspinlock can be used by all archs, though it may
be not optimized.
2, _Q_PENDING_BITS=8 qspinlock can be used if hardware supports
sub-word xchg/cmpxchg, or the software emulation is correctly
implemented. But the current MIPS emulation is not correct.

If so, I want to rename ARCH_HAS_HW_XCHG_SMALL to
ARCH_HAS_FAST_XCHG_SMALL, and let these archs select it:
1, X86, ARM, ARM64, IA64, M68K, because they have hardware support.
2, Other archs who select qspinlock currently (including MIPS),
because their current behavior is use _Q_PENDING_BITS=8 qspinlock and
we don't want to change anything in this patch. If their emulation is
broken or not as "fast" as expected, we can make new patches to
unselect the ARCH_HAS_FAST_XCHG_SMALL option.

Huacai