On Tue, Apr 12, 2022 at 10:47 PM Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > > I agree. There is also an implicit expectation that the DMA API works on > kmalloc'ed buffers and that's what ARCH_DMA_MINALIGN is for (and the > dynamic arch_kmalloc_minalign() in this series). But the key point is > that the driver doesn't need to know the CPU cache topology, coherency, > the DMA API and kmalloc() take care of these. Honestly, I think it would probably be worth discussing the "kmalloc DMA alignment" issues. 99.9% of kmalloc users don't want to do DMA. And there's actually a fair amount of small kmalloc for random stuff. Right now on my laptop, I have kmalloc-8 16907 18432 8 512 1 : ... according to slabinfo, so almost 17 _thousand_ allocations of 8 bytes. It's all kinds of sad if those allocations need to be 64 bytes in size just because of some silly DMA alignment issue, when none of them want it. Yeah, yeah, wasting a megabyte of memory is "just a megabyte" these days. Which is crazy. It's literally memory that could have been used for something much more useful than just pure and utter waste. I think we could and should just say "people who actually require DMA accesses should say so at kmalloc time". We literally have that GFP_DMA and ZOME_DMA for various historical reasons, so we've been able to do that before. No, that historical GFP_DMA isn't what arm64 wants - it's the old crazy "legacy 16MB DMA" thing that ISA DMA used to have. But the basic issue was true then, and is true now - DMA allocations are fairly special, and should not be that hard to just mark as such. We could add a trivial wrapper function like static void *dma_kmalloc(size_t size) { return kmalloc(size | (ARCH_DMA_MINALIGN-1); } which now means that the size argument is guaranteed to be big enough (not not overflow to zero) that you get that aligned memory allocation. We could perhaps even have other special rules. Including really specific ones, like saying - allocations smaller than 32 bytes are not DMA coherent, because we pack them which would allow those small allocations to not pointlessly waste memory. I dunno. But it's ridiculous that arm64 wastes so much memory when it's approximately never needed. Linus