Hi Peter, On Wed, Jul 28, 2021 at 7:02 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Wed, Jul 28, 2021 at 06:40:54PM +0800, Huacai Chen wrote: > > Hi, Peter, > > > > On Tue, Jul 27, 2021 at 7:06 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > > > On Tue, Jul 27, 2021 at 10:29:59AM +0800, Boqun Feng wrote: > > > > > > > > "How to implement xchg_tail" shouldn't force with _Q_PENDING_BITS, but > > > > > the arch could choose. > > > > > > > > I actually agree with this part, but this patchset failed to provide > > > > enough evidences on why we should choose xchg_tail() implementation > > > > based on whether hardware has xchg16, more precisely, for an archtecture > > > > which doesn't have a hardware xchg16, why cmpxchg emulated xchg16() is > > > > worse than a "load+cmpxchg) implemeneted xchg_tail()? If it's a > > > > performance reason, please show some numbers. > > > > > > Right. Their problem is their broken xchg16() implementation. > > > > Please correct me if I'm wrong. Now my understanding is like this: > > 1, _Q_PENDING_BITS=1 qspinlock can be used by all archs, though it may > > be not optimized. > > Only if your arch has fwd progress guarantees for cmpxchg(). LL/SC based > cmpxchg() is tricky, and typically needs software based backoff on > failure. > > The qspinlock code is written in generic code, but it very much relies > on an architecture to audit and vet the resulting code is sane for them. > Clearly MIPS didn't do a good job of that. > > > 2, _Q_PENDING_BITS=8 qspinlock can be used if hardware supports > > sub-word xchg/cmpxchg, or the software emulation is correctly > > implemented. But the current MIPS emulation is not correct. > > Everything always relies on things being correctly implemented. 1) > relies on cmpxchg() being correctly implemented. 2) relies on xchg16() > being correctly implemented. > > Of these 2 is actually easier to implement correctly on LL/SC. > > > If so, I want to rename ARCH_HAS_HW_XCHG_SMALL to > > ARCH_HAS_FAST_XCHG_SMALL, and let these archs select it: > > 1, X86, ARM, ARM64, IA64, M68K, because they have hardware support. > > 2, Other archs who select qspinlock currently (including MIPS), > > because their current behavior is use _Q_PENDING_BITS=8 qspinlock and > > we don't want to change anything in this patch. If their emulation is > > broken or not as "fast" as expected, we can make new patches to > > unselect the ARCH_HAS_FAST_XCHG_SMALL option. > > I utterly fail to see the point of any of this. If you use qspinlock, > you had better have audited the whole thing for your architecture. And > FAST_XCHG_SMALL is completely irrelevant for anything here. > Our original goal of this patch is let LoongArch use qspinlock, but Arnd let us remove xchg_small, so we want to let LoongArch use _Q_PENDING_BITS=1 lock . :) Now Rui Wang has a better solution, and xchg_small will be reimplemented for LoongArch, so I think this patch is no longer needed. Huacai