Re: [PATCH v2 2/2] treewide: Add the __GFP_PACKED flag to several non-DMA kmalloc() allocations

Isaac Manjarres <isaacmanjarres@xxxxxxxxxx> · Tue, 1 Nov 2022 12:10:51 -0700

On Tue, Nov 01, 2022 at 06:39:40PM +0100, Christoph Hellwig wrote:
> On Tue, Nov 01, 2022 at 05:32:14PM +0000, Catalin Marinas wrote:
> > There's also the case of low-end phones with all RAM below 4GB and arm64
> > doesn't allocate the swiotlb. Not sure those vendors would go with a
> > recent kernel anyway.
> > 
> > So the need for swiotlb now changes from 32-bit DMA to any DMA
> > (non-coherent but we can't tell upfront when booting, devices may be
> > initialised pretty late).
Not only low-end phones, but there are other form-factors that can fall
into this category and are also memory constrained (e.g. wearable
devices), so the memory headroom impact from enabling SWIOTLB might be
non-negligible for all of these devices. I also think it's feasible for
those devices to use recent kernels.
> 
> Yes.  The other option would be to use the dma coherent pool for the
> bouncing, which must be present on non-coherent systems anyway.  But
> it would require us to write a new set of bounce buffering routines.
I think in addition to having to write new bounce buffering routines,
this approach still suffers the same problem as SWIOTLB, which is that
the memory for SWIOTLB and/or the dma coherent pool is not reclaimable,
even when it is not used.

There's not enough context in the DMA mapping routines to know if we need
an atomic allocation, so if we used kmalloc(), instead of SWIOTLB, to
dynamically allocate memory, it would always have to use GFP_ATOMIC. But
what about having a pool that has a small amount of memory and is
composed of several objects that can be used for small DMA transfers? If
the amount of memory in the pool starts falling below a certain
threshold, there can be a worker thread--so that we don't have to use
GFP_ATOMIC--that can add more memory to the pool?

--Isaac