Re: [PATCH 01/10] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN

Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> · Wed, 6 Apr 2022 21:09:14 +0900

On Wed, Apr 06, 2022 at 09:29:19AM +0200, Arnd Bergmann wrote:
> On Wed, Apr 6, 2022 at 1:59 AM Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> wrote:
> >
> > On Tue, Apr 05, 2022 at 02:57:49PM +0100, Catalin Marinas wrote:
> > > In preparation for supporting a dynamic kmalloc() minimum alignment,
> > > allow architectures to define ARCH_KMALLOC_MINALIGN independently of
> > > ARCH_DMA_MINALIGN. In addition, always define ARCH_DMA_MINALIGN even if
> > > an architecture does not override it.
> > >
> >
> > [ +Cc slab maintainer/reviewers ]
> >
> > I get why you want to set minimum alignment of kmalloc() dynamically.
> > That's because cache line size can be different and we cannot statically
> > know that, right?
> >
> > But I don't get why you are trying to decouple ARCH_KMALLOC_MINALIGN
> > from ARCH_DMA_MINALIGN. kmalloc'ed buffer is always supposed to be DMA-safe.
> >
> > I'm afraid this series may break some archs/drivers.
> >
> > in Documentation/dma-api-howto.rst:
> > > 2) ARCH_DMA_MINALIGN
> > >
> > >   Architectures must ensure that kmalloc'ed buffer is
> > >   DMA-safe. Drivers and subsystems depend on it. If an architecture
> > >   isn't fully DMA-coherent (i.e. hardware doesn't ensure that data in
> > >   the CPU cache is identical to data in main memory),
> > >   ARCH_DMA_MINALIGN must be set so that the memory allocator
> > >   makes sure that kmalloc'ed buffer doesn't share a cache line with
> > >   the others. See arch/arm/include/asm/cache.h as an example.
> > >
> > >   Note that ARCH_DMA_MINALIGN is about DMA memory alignment
> > >   constraints. You don't need to worry about the architecture data
> > >   alignment constraints (e.g. the alignment constraints about 64-bit
> > >   objects).
> >
> > If I'm missing something, please let me know :)
> 
> It helps in two ways:
> 
> - you can start with a relatively large hardcoded ARCH_DMA_MINALIGN
>   of 128 or 256 bytes, depending on what the largest possible line size
>   is for any machine you want to support, and then drop that down to
>   32 or 64 bytes based on runtime detection. This should always be safe,
>   and it means a very sizable chunk of wasted memory can be recovered.
>

I agree this part.

> - On systems that are fully cache coherent, there is no need to align
>   kmallloc() allocations for DMA safety at all, on these, we can drop the
>   size even below the cache line. This does not apply on most of the
>   cheaper embedded or mobile SoCs, but it helps a lot on the machines
>   you'd find in a data center.

Now I get the point. Thank you for explanation!
Going to review this series soon.

> 
>         Arnd

-- 
Thanks,
Hyeonggon