On Fri, Jul 23, 2021 at 12:41 PM Huacai Chen <chenhuacai@xxxxxxxxx> wrote: > On Tue, Jul 6, 2021 at 6:22 PM Arnd Bergmann <arnd@xxxxxxxx> wrote: > > On Tue, Jul 6, 2021 at 6:18 AM Huacai Chen <chenhuacai@xxxxxxxxxxx> wrote: > > > +static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old, > > > + unsigned long new, unsigned int size) > > > +{ > > > + switch (size) { > > > + case 1: > > > + case 2: > > > + return __cmpxchg_small(ptr, old, new, size); > > > > Same here. > > 16bit cmpxchg is used by qspinlock. Yes, you suggest we should not use > qspinlock, but our test results show that ticket spinlock is even > worse... So, we want to keep cmpxchg_small() and qspinlock. Can you explain in detail how that cmpxchg loop provides the necessary forward progress guarantees to make qspinlock reliable? As Peter keeps pointing out, usually this is not actually the case, so faster-but-broken is not a replacement for a simple working version. If you already have the ticket spinlock implementation, maybe start out by using that for the initial submission, and then provide an extra patch to convert it to qspinlock as a follow-up that can be debated in more detail regarding correctness and performance. > > > +static inline void __iomem *ioremap_prot(phys_addr_t offset, unsigned long size, > > > + unsigned long prot_val) > > > +{ > > > + /* This only works for !HIGHMEM currently */ > > > > Do you support highmem? I would expect new architectures to no longer > > implement that. Just use a 64-bit kernel on systems with lots of ram. > > Emmm, 64-bit kernel doesn't need highmem. Yes, that was my point: 32-bit user space is fine if you care a lot about size constraints of DDR2 or older memory. For any system that has enough memory to require highmem, it is better to just pick a 64-bit kernel to start with, if the CPU allows it. > > > +#define ioremap(offset, size) \ > > > + ioremap_prot((offset), (size), _CACHE_SUC) > > > +#define ioremap_uc ioremap > > > > Remove ioremap_uc(), it should never be called here. > It is used by lib/devres.c. There is a default implementation in include/asm-generic/io.h that just returns NULL here. > > > +#define __BUILD_MEMORY_SINGLE(pfx, bwlq, type) \ > > > + \ > > > +static inline void pfx##write##bwlq(type val, \ > > > + volatile void __iomem *mem) \ > > > +{ \ > > > > Please don't add another copy of these macros. Use the version from > > include/asm-generic, or modify it as needed if it doesn't quite work. > > On Loongson platform, we should put a wmb() before MMIO write. The > generic readw()/readl()/outw()/outl() have wmb(), but the __raw > versions don't have. I want to know what is the design goal of the > __raw version, are they supposed to be used in scenarios that the > ordering needn't be cared? The __raw versions are mainly meant for accessing memory from platform specific driver code. They don't provide any particular endianness or ordering guarantees and generally cannot be used in portable drivers. Note that a full wmb() should not be needed, you only have to serialize between prior memory accesses and a DMA triggered by the device, but not between multiple CPUs here. Arnd