Re: [PATCH RFC 1/2] arch: Introduce ARCH_HAS_HW_XCHG_SMALL

Huacai Chen <chenhuacai@xxxxxxxxx> · Fri, 30 Jul 2021 00:38:14 +0800

Hi Peter,

On Wed, Jul 28, 2021 at 7:02 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Wed, Jul 28, 2021 at 06:40:54PM +0800, Huacai Chen wrote:
> > Hi, Peter,
> >
> > On Tue, Jul 27, 2021 at 7:06 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > >
> > > On Tue, Jul 27, 2021 at 10:29:59AM +0800, Boqun Feng wrote:
> > >
> > > > > "How to implement xchg_tail" shouldn't force with _Q_PENDING_BITS, but
> > > > > the arch could choose.
> > > >
> > > > I actually agree with this part, but this patchset failed to provide
> > > > enough evidences on why we should choose xchg_tail() implementation
> > > > based on whether hardware has xchg16, more precisely, for an archtecture
> > > > which doesn't have a hardware xchg16, why cmpxchg emulated xchg16() is
> > > > worse than a "load+cmpxchg) implemeneted xchg_tail()? If it's a
> > > > performance reason, please show some numbers.
> > >
> > > Right. Their problem is their broken xchg16() implementation.
> >
> > Please correct me if I'm wrong. Now my understanding is like this:
> > 1, _Q_PENDING_BITS=1 qspinlock can be used by all archs, though it may
> > be not optimized.
>
> Only if your arch has fwd progress guarantees for cmpxchg(). LL/SC based
> cmpxchg() is tricky, and typically needs software based backoff on
> failure.
>
> The qspinlock code is written in generic code, but it very much relies
> on an architecture to audit and vet the resulting code is sane for them.
> Clearly MIPS didn't do a good job of that.
>
> > 2, _Q_PENDING_BITS=8 qspinlock can be used if hardware supports
> > sub-word xchg/cmpxchg, or the software emulation is correctly
> > implemented. But the current MIPS emulation is not correct.
>
> Everything always relies on things being correctly implemented. 1)
> relies on cmpxchg() being correctly implemented. 2) relies on xchg16()
> being correctly implemented.
>
> Of these 2 is actually easier to implement correctly on LL/SC.
>
> > If so, I want to rename ARCH_HAS_HW_XCHG_SMALL to
> > ARCH_HAS_FAST_XCHG_SMALL, and let these archs select it:
> > 1, X86, ARM, ARM64, IA64, M68K, because they have hardware support.
> > 2, Other archs who select qspinlock currently (including MIPS),
> > because their current behavior is use _Q_PENDING_BITS=8 qspinlock and
> > we don't want to change anything in this patch. If their emulation is
> > broken or not as "fast" as expected, we can make new patches to
> > unselect the ARCH_HAS_FAST_XCHG_SMALL option.
>
> I utterly fail to see the point of any of this. If you use qspinlock,
> you had better have audited the whole thing for your architecture. And
> FAST_XCHG_SMALL is completely irrelevant for anything here.
>
Our original goal of this patch is let LoongArch use qspinlock, but
Arnd let us remove xchg_small, so we want to let LoongArch use
_Q_PENDING_BITS=1 lock . :)

Now Rui Wang has a better solution, and xchg_small will be
reimplemented for LoongArch, so I think this patch is no longer
needed.

Huacai