On Sat, Jun 18, 2022 at 12:11 AM Arnd Bergmann <arnd@xxxxxxxx> wrote: > > On Fri, Jun 17, 2022 at 4:57 PM Huacai Chen <chenhuacai@xxxxxxxxxxx> wrote: > > > > On NUMA system, the performance of qspinlock is better than generic > > spinlock. Below is the UnixBench test results on a 8 nodes (4 cores > > per node, 32 cores in total) machine. > > > > The performance increase is nice, but this is only half the story we need here. > > I think the more important bit is how you can guarantee that the xchg16() > implementation is correct and always allows forward progress. > > >@@ -123,6 +123,10 @@ static inline unsigned long __percpu_xchg(void *ptr, unsigned long val, > > int size) > > { > > switch (size) { > >+ case 1: > >+ case 2: > >+ return __xchg_small((volatile void *)ptr, val, size); > >+ > > Do you actually need the size 1 as well? > > Generally speaking, I would like to rework the xchg()/cmpxchg() logic > to only cover the 32-bit and word-sized (possibly 64-bit) case, while > having separate optional 8-bit and 16-bit functions. I had a patch for Why not prevent 8-bit and 16-bit xchg()/cmpxchg() directly? eg: move qspinlock xchg_tail to per arch_xchg_tail. That means Linux doesn't provide a mixed-size atomic operation primitive. What does your "separate optional 8-bit and 16-bit functions" mean here? > this in the past, and can try to dig that out, this may be the time to > finally do that. > > I see that the qspinlock() code actually calls a 'relaxed' version of xchg16(), > but you only implement the one with the full barrier. Is it possible to > directly provide a relaxed version that has something less than the > __WEAK_LLSC_MB? I am also curious that __WEAK_LLSC_MB is very magic. How does it prevent preceded accesses from happening after sc for a strong cmpxchg? #define __cmpxchg_asm(ld, st, m, old, new) \ ({ \ __typeof(old) __ret; \ \ __asm__ __volatile__( \ "1: " ld " %0, %2 # __cmpxchg_asm \n" \ " bne %0, %z3, 2f \n" \ " or $t0, %z4, $zero \n" \ " " st " $t0, %1 \n" \ " beq $zero, $t0, 1b \n" \ "2: \n" \ __WEAK_LLSC_MB \ And its __smp_mb__xxx are just defined as a compiler barrier()? #define __smp_mb__before_atomic() barrier() #define __smp_mb__after_atomic() barrier() > > Arnd -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/