Re: [PATCH] LoongArch: Add qspinlock support

Arnd Bergmann <arnd@xxxxxxxx> · Fri, 17 Jun 2022 18:10:51 +0200

On Fri, Jun 17, 2022 at 4:57 PM Huacai Chen <chenhuacai@xxxxxxxxxxx> wrote:
>
> On NUMA system, the performance of qspinlock is better than generic
> spinlock. Below is the UnixBench test results on a 8 nodes (4 cores
> per node, 32 cores in total) machine.
>

The performance increase is nice, but this is only half the story we need here.

I think the more important bit is how you can guarantee that the xchg16()
implementation is correct and always allows forward progress.

>@@ -123,6 +123,10 @@ static inline unsigned long __percpu_xchg(void *ptr, unsigned long val,
>                                                int size)
> {
>        switch (size) {
>+       case 1:
>+       case 2:
>+               return __xchg_small((volatile void *)ptr, val, size);
>+

Do you actually need the size 1 as well?

Generally speaking, I would like to rework the xchg()/cmpxchg() logic
to only cover the 32-bit and word-sized (possibly 64-bit) case, while
having separate optional 8-bit and 16-bit functions. I had a patch for
this in the past, and can try to dig that out, this may be the time to
finally do that.

I see that the qspinlock() code actually calls a 'relaxed' version of xchg16(),
but you only implement the one with the full barrier. Is it possible to
directly provide a relaxed version that has something less than the
__WEAK_LLSC_MB?

       Arnd