Hi, Arnd, On Fri, Jul 23, 2021 at 7:43 PM Arnd Bergmann <arnd@xxxxxxxx> wrote: > > On Fri, Jul 23, 2021 at 12:41 PM Huacai Chen <chenhuacai@xxxxxxxxx> wrote: > > On Tue, Jul 6, 2021 at 6:22 PM Arnd Bergmann <arnd@xxxxxxxx> wrote: > > > On Tue, Jul 6, 2021 at 6:18 AM Huacai Chen <chenhuacai@xxxxxxxxxxx> wrote: > > > > +static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old, > > > > + unsigned long new, unsigned int size) > > > > +{ > > > > + switch (size) { > > > > + case 1: > > > > + case 2: > > > > + return __cmpxchg_small(ptr, old, new, size); > > > > > > Same here. > > > > 16bit cmpxchg is used by qspinlock. Yes, you suggest we should not use > > qspinlock, but our test results show that ticket spinlock is even > > worse... So, we want to keep cmpxchg_small() and qspinlock. > > Can you explain in detail how that cmpxchg loop provides the necessary > forward progress guarantees to make qspinlock reliable? > > As Peter keeps pointing out, usually this is not actually the case, so > faster-but-broken is not a replacement for a simple working version. > > If you already have the ticket spinlock implementation, maybe start out > by using that for the initial submission, and then provide an extra > patch to convert it to qspinlock as a follow-up that can be debated > in more detail regarding correctness and performance. After reading the existing topics on qspinlock from mail list, I have done an offline discussion with Jiaxun Yang and Rui Wang. Then we think that if we use the _Q_PENDING_BITS=1 definition, those archs without sub-word xchg/cmpxchg can use qspinlock as well. Plese see my RFC patches: https://lore.kernel.org/linux-arch/20210724123617.3525377-1-chenhuacai@xxxxxxxxxxx/T/#t Huacai > > > > > +static inline void __iomem *ioremap_prot(phys_addr_t offset, unsigned long size, > > > > + unsigned long prot_val) > > > > +{ > > > > + /* This only works for !HIGHMEM currently */ > > > > > > Do you support highmem? I would expect new architectures to no longer > > > implement that. Just use a 64-bit kernel on systems with lots of ram. > > > > Emmm, 64-bit kernel doesn't need highmem. > > Yes, that was my point: 32-bit user space is fine if you care a lot about > size constraints of DDR2 or older memory. For any system that has > enough memory to require highmem, it is better to just pick a 64-bit > kernel to start with, if the CPU allows it. > > > > > +#define ioremap(offset, size) \ > > > > + ioremap_prot((offset), (size), _CACHE_SUC) > > > > +#define ioremap_uc ioremap > > > > > > Remove ioremap_uc(), it should never be called here. > > It is used by lib/devres.c. > > There is a default implementation in include/asm-generic/io.h > that just returns NULL here. > > > > > +#define __BUILD_MEMORY_SINGLE(pfx, bwlq, type) \ > > > > + \ > > > > +static inline void pfx##write##bwlq(type val, \ > > > > + volatile void __iomem *mem) \ > > > > +{ \ > > > > > > Please don't add another copy of these macros. Use the version from > > > include/asm-generic, or modify it as needed if it doesn't quite work. > > > > On Loongson platform, we should put a wmb() before MMIO write. The > > generic readw()/readl()/outw()/outl() have wmb(), but the __raw > > versions don't have. I want to know what is the design goal of the > > __raw version, are they supposed to be used in scenarios that the > > ordering needn't be cared? > > The __raw versions are mainly meant for accessing memory from > platform specific driver code. They don't provide any particular > endianness or ordering guarantees and generally cannot be used > in portable drivers. > > Note that a full wmb() should not be needed, you only have to serialize > between prior memory accesses and a DMA triggered by the device, > but not between multiple CPUs here. > > Arnd