Hi, I called this 'v2' but it's a different incarnation of the previous attempt to reduce ARCH_KMALLOC_MINALIGN for arm64 (and please treat it as an RFC): https://lore.kernel.org/all/20220405135758.774016-1-catalin.marinas@xxxxxxx/ The long discussion in the thread above seemed to converge towards bouncing of the DMA buffers that came from small kmalloc() caches. However, there are a few problems with such approach: - We still need ARCH_KMALLOC_MINALIGN to match the smaller kmalloc() size but this breaks crypto without changing the structure alignments to the larger ARCH_DMA_MINALIGN (not agreed upon yet). - Not all systems, especially mobile devices, have a swiotlb buffer (most boot with 'noswiotlb'). Even of we allow a small bounce buffer, it's hard to come up with a size that would please the mobile vendors. - An alternative to the default bounce buffer is to use another kmalloc() for the bouncing, though it adds an overhead, potential allocation failures and DMA unmapping cannot be easily tracked back to the bounce kmalloc'ed buffer (no simple is_swiotlb_buffer()). - IOMMU needs bouncing as well when cache maintenance is required. This seems to get pretty complicated in the presence of scatterlists (though probably doable with enough dedication but it would add an overhead with the additional scatterlist scanning). - Custom dma_ops - not too many but they'd also need the bouncing logic. - Undecided on whether we should only bounce small sizes or any unaligned size. The latter would end up bouncing large-ish buffers (network) but it has the advantage that we don't need to change the crypto code if the ARCH_KMALLOC_MINALIGN becomes 8 (well, it will bounce all the DMA until the code is changed). So, with this patchset, I've given up on trying to sort out the bouncing and went for an explicit opt-in to smaller kmalloc() sizes via a newly introduced __GFP_PACKED flag which would return objects aligned to KMALLOC_PACKED_ALIGN (8 bytes with slub, 32 with slab). The second patch sprinkles the kernel with several gfp flag updates where some hot allocations seem to be on my hardware (the commit log has instructions on how to identify them). On a Cavium TX2, after boot, I get: kmalloc-128 24608 24608 128 32 1 : tunables 0 0 0 : slabdata 769 769 0 kmalloc-96 9408 9660 96 42 1 : tunables 0 0 0 : slabdata 230 230 0 kmalloc-64 9594 9728 64 64 1 : tunables 0 0 0 : slabdata 152 152 0 kmalloc-32 14052 14464 32 128 1 : tunables 0 0 0 : slabdata 113 113 0 kmalloc-16 26368 26368 16 256 1 : tunables 0 0 0 : slabdata 103 103 0 kmalloc-8 36352 36352 8 512 1 : tunables 0 0 0 : slabdata 71 71 0 It's still not ideal and I suspect there may be some early allocations even prior to ftrace identifying them but there is a significant improvement without breaking any of the existing cases. To me, the ideal approach would be __dma annotations on pointers aimed at DMA and using kernel tools like sparse to identify them. A dma_kmalloc() would return such pointers. However, things get muddier with scatterlists since functions like sg_set_page() don't have any such pointer information (can we mark the offset as __dma instead?). Comments/suggestions are welcome but, as per above, I'd like to stay away from bouncing if possible. Thanks. Catalin Marinas (2): mm: slab: Introduce __GFP_PACKED for smaller kmalloc() alignments treewide: Add the __GFP_PACKED flag to several non-DMA kmalloc() allocations drivers/usb/core/message.c | 3 ++- fs/binfmt_elf.c | 6 ++++-- fs/dcache.c | 3 ++- fs/ext4/dir.c | 4 ++-- fs/ext4/extents.c | 4 ++-- fs/file.c | 2 +- fs/kernfs/file.c | 8 ++++---- fs/nfs/dir.c | 7 ++++--- fs/nfs/inode.c | 2 +- fs/nfs/nfs4state.c | 2 +- fs/nfs/write.c | 3 ++- fs/notify/inotify/inotify_fsnotify.c | 3 ++- fs/proc/self.c | 2 +- fs/seq_file.c | 5 +++-- include/acpi/platform/aclinuxex.h | 6 ++++-- include/linux/gfp_types.h | 10 ++++++++-- include/linux/slab.h | 22 ++++++++++++++++++---- kernel/trace/ftrace.c | 2 +- kernel/trace/tracing_map.c | 2 +- lib/kasprintf.c | 2 +- lib/kobject.c | 4 ++-- lib/mpi/mpiutil.c | 2 +- mm/list_lru.c | 6 ++++-- mm/memcontrol.c | 4 ++-- mm/slab_common.c | 3 ++- mm/util.c | 6 +++--- mm/vmalloc.c | 6 ++++-- net/sunrpc/auth_unix.c | 2 +- net/sunrpc/xdr.c | 2 +- security/apparmor/lsm.c | 2 +- security/security.c | 4 ++-- security/tomoyo/realpath.c | 2 +- 32 files changed, 88 insertions(+), 53 deletions(-)