Re: [PATCH] LoongArch: Add qspinlock support

Guo Ren <guoren@xxxxxxxxxx> · Sun, 19 Jun 2022 23:23:15 +0800

On Sat, Jun 18, 2022 at 12:11 AM Arnd Bergmann <arnd@xxxxxxxx> wrote:
>
> On Fri, Jun 17, 2022 at 4:57 PM Huacai Chen <chenhuacai@xxxxxxxxxxx> wrote:
> >
> > On NUMA system, the performance of qspinlock is better than generic
> > spinlock. Below is the UnixBench test results on a 8 nodes (4 cores
> > per node, 32 cores in total) machine.
> >
>
> The performance increase is nice, but this is only half the story we need here.
>
> I think the more important bit is how you can guarantee that the xchg16()
> implementation is correct and always allows forward progress.
>
> >@@ -123,6 +123,10 @@ static inline unsigned long __percpu_xchg(void *ptr, unsigned long val,
> >                                                int size)
> > {
> >        switch (size) {
> >+       case 1:
> >+       case 2:
> >+               return __xchg_small((volatile void *)ptr, val, size);
> >+
>
> Do you actually need the size 1 as well?
>
> Generally speaking, I would like to rework the xchg()/cmpxchg() logic
> to only cover the 32-bit and word-sized (possibly 64-bit) case, while
> having separate optional 8-bit and 16-bit functions. I had a patch for
> this in the past, and can try to dig that out, this may be the time to
> finally do that.
>
> I see that the qspinlock() code actually calls a 'relaxed' version of xchg16(),
> but you only implement the one with the full barrier. Is it possible to
> directly provide a relaxed version that has something less than the
> __WEAK_LLSC_MB?
There is no __WEAK_LLSC_MB in __xchg_small, and it's Full-fence +
xchg16_relaxed() + Full-fence (by hev explained).

The __cmpxchg_small isn't related to qspinlock, we could drop it from
this patch.

>
>        Arnd

-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/