Re: [PATCH 12/19] LoongArch: Add misc common routines

Arnd Bergmann <arnd@xxxxxxxx> · Fri, 23 Jul 2021 13:43:30 +0200

On Fri, Jul 23, 2021 at 12:41 PM Huacai Chen <chenhuacai@xxxxxxxxx> wrote:
> On Tue, Jul 6, 2021 at 6:22 PM Arnd Bergmann <arnd@xxxxxxxx> wrote:
> > On Tue, Jul 6, 2021 at 6:18 AM Huacai Chen <chenhuacai@xxxxxxxxxxx> wrote:
> > > +static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
> > > +                                     unsigned long new, unsigned int size)
> > > +{
> > > +       switch (size) {
> > > +       case 1:
> > > +       case 2:
> > > +               return __cmpxchg_small(ptr, old, new, size);
> >
> > Same here.
>
> 16bit cmpxchg is used by qspinlock. Yes, you suggest we should not use
> qspinlock, but our test results show that ticket spinlock is even
> worse... So, we want to keep cmpxchg_small() and qspinlock.

Can you explain in detail how that cmpxchg loop provides the necessary
forward progress guarantees to make qspinlock reliable?

As Peter keeps pointing out, usually this is not actually the case, so
faster-but-broken is not a replacement for a simple working version.

If you already have the ticket spinlock implementation, maybe start out
by using that for the initial submission, and then provide an extra
patch to convert it to qspinlock as a follow-up that can be debated
in more detail regarding correctness and performance.

> > > +static inline void __iomem *ioremap_prot(phys_addr_t offset, unsigned long size,
> > > +       unsigned long prot_val)
> > > +{
> > > +       /* This only works for !HIGHMEM currently */
> >
> > Do you support highmem? I would expect new architectures to no longer
> > implement that. Just use a 64-bit kernel on systems with lots of ram.
>
> Emmm, 64-bit kernel doesn't need highmem.

Yes, that was my point: 32-bit user space is fine if you care a lot about
size constraints of DDR2 or older memory. For any system that has
enough memory to require highmem, it is better to just pick a 64-bit
kernel to start with, if the CPU allows it.

> > > +#define ioremap(offset, size)                                  \
> > > +       ioremap_prot((offset), (size), _CACHE_SUC)
> > > +#define ioremap_uc ioremap
> >
> > Remove ioremap_uc(), it should never be called here.
> It is used by lib/devres.c.

There is a default implementation in include/asm-generic/io.h
that just returns NULL here.

> > > +#define __BUILD_MEMORY_SINGLE(pfx, bwlq, type)                         \
> > > +                                                                       \
> > > +static inline void pfx##write##bwlq(type val,                          \
> > > +                                   volatile void __iomem *mem)         \
> > > +{                                                                      \
> >
> > Please don't add another copy of these macros. Use the version from
> > include/asm-generic, or modify it as needed if it doesn't quite work.
>
> On Loongson platform, we should put a wmb() before MMIO write. The
> generic readw()/readl()/outw()/outl() have wmb(), but the __raw
> versions don't have. I want to know what is the design goal of the
> __raw version, are they supposed to be used in scenarios that the
> ordering needn't be cared?

The __raw versions are mainly meant for accessing memory from
platform specific driver code. They don't provide any particular
endianness or ordering guarantees and generally cannot be used
in portable drivers.

Note that a full wmb() should not be needed, you only have to serialize
between prior memory accesses and a DMA triggered by the device,
but not between multiple CPUs here.

      Arnd