Re: [PATCH 07/10] crypto: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN

Catalin Marinas <catalin.marinas@xxxxxxx> · Sun, 16 Oct 2022 22:37:48 +0100

On Fri, Oct 14, 2022 at 01:44:25PM -0700, Linus Torvalds wrote:
> On Fri, Oct 14, 2022 at 1:24 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > Agreed. Even allowing a 64-byte kmalloc cache on a system with a
> > 64-byte cacheline size saves quite a bit of memory.
> 
> Well, the *really* trivial thing to do is to just say "if the platform
> is DMA coherent, just allow any size kmalloc cache". And just
> consciously leave the broken garbage behind.

The problem is we don't have a reliable way to tell whether the platform
is DMA-coherent. The CPU IDs don't really say much and in most cases
it's a property of the interconnect/bus and device. We describe the DMA
coherency in DT or ACPI and the latter is somewhat better as it assumes
coherent by default. But for DT, not having a 'dma-coherent' property
means non-coherent DMA (or no DMA at all). We can't even tell whether
the device is going to do any DMA, arch_setup_dma_ops() is called even
for devices like the PMU. We could look into defining new properties
(e.g. "no-dma") and adjust the DTs but we may also have late probed
devices, long after the slab allocator was initialised. A big
'dma-coherent' property on the top node may work but most Android
systems won't benefit from this (your laptop may, I haven't checked).

I think the best bet is still either (a) bounce for small sizes or (b) a
new GFP_NODMA/PACKED/etc. flag for the hot small allocations. (a) is
somehow more universal but lots (most) Android devices are deployed with
no swiotlb buffer as the vendor knows the device needs and don't need
extra buffer waste. Not sure how reliable it would be to trigger another
slab allocation on the dma_map_*() calls for the bounce (it may need to
be GFP_ATOMIC). Option (b) looks more appealing on such systems, though
a lot more churn throughout the kernel.

-- 
Catalin