On 2/3/21 2:41 AM, Abel Wu wrote: >> On Feb 2, 2021, at 6:11 PM, Christoph Lameter <cl@xxxxxxxxx> wrote: >> >> On Tue, 2 Feb 2021, Abel Wu wrote: >> >>> Since slab_alloc_node() is the only caller of __slab_alloc(), embed >>> __slab_alloc() to its caller to save function call overhead. This >>> will also expand the caller's code block size a bit, but hackbench >>> tests on both host and guest didn't show a difference w/ or w/o >>> this patch. >> >> slab_alloc_node is an always_inline function. It is intentional that only >> the fast path was inlined and not the slow path. > > Oh I got it. Thanks for your excellent explanation. BTW, there's a script in the Linux source to nicely see the effect of such changes: ./scripts/bloat-o-meter slub.o.before mm/slub.o add/remove: 0/1 grow/shrink: 9/0 up/down: 1660/-1130 (530) Function old new delta __slab_alloc 127 1130 +1003 __kmalloc_track_caller 877 965 +88 __kmalloc 878 966 +88 kmem_cache_alloc 778 862 +84 __kmalloc_node_track_caller 996 1080 +84 kmem_cache_alloc_node_trace 813 896 +83 kmem_cache_alloc_node 800 881 +81 kmem_cache_alloc_trace 786 862 +76 __kmalloc_node 998 1071 +73 ___slab_alloc 1130 - -1130 Total: Before=57782, After=58312, chg +0.92% And yeah, bloating all the entry points wouldn't be nice. Thanks, Vlastimil