On Thu, Oct 13, 2022 at 11:58:22AM -0700, Saravana Kannan wrote: > On Thu, Oct 13, 2022 at 9:57 AM Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > > > If so, would there be concerns that the memory savings we get back from > > > reducing the memory footprint of kmalloc might be defeated by how much > > > memory is needed for bounce buffering? > > > > It's not necessarily about the saved memory but also locality of the > > small buffer allocations, less cache and TLB pressure. > > Part of the pushback we get when we try to move some of the Android > ecosystem from 32-bit to 64-bit is the memory usage increase. So, > while the main goal might not be memory savings, it'll be good to keep > that in mind too. I'd definitely not want this patch series to make > things worse. Ideally, it'd make things better. 10MB is considered a > lot on some of the super low speced devices. Well, we can still add the option to skip allocating from the small kmalloc caches if there's no swiotlb available. > > I wonder whether swiotlb is actually the best option for bouncing > > unaligned buffers. We could use something like mempool_alloc() instead > > if we stick to small buffers rather than any (even large) buffer that's > > not aligned to a cache line. Or just go for kmem_cache_alloc() directly. > > A downside is that we may need GFP_ATOMIC for such allocations, so > > higher risk of failure. > > Yeah, a temporary kmem_cache_alloc() to bounce buffers off of feels > like a better idea than swiotlb. Especially for small allocations (say > 8 byte allocations) that might have gone into the kmem-cache-64 if we > hadn't dropped KMALLOC_MIN_ALIGN to 8. I now remembered why this isn't trivial. On the dma_unmap_*() side, we can't easily tell whether the buffer was bounced and it needs freeing. The swiotlb solves this by checking whether the address is within the pre-allocated swiotlb range. With kmem_caches, we could dig into the slab internals and check whether the pointer is part of a slab page, then check with kmem_cache that was. It looks too complicated and I'm rather tempted to just go for swiotlb if available or prevent the creation of small kmalloc caches if no swiotlb (TBH, even the initial series was an improvement dropping KMALLOC_MIN_ALIGN from 128 to 64). -- Catalin