On 8/2/22 10:59, Hyeonggon Yoo wrote: > On Mon, Aug 01, 2022 at 04:44:22PM +0200, Vlastimil Babka wrote: >> > > Yeah, uninlining __kmalloc_large_node saves hundreds of bytes. > And the diff below looks good to me. > > By The Way, do you have opinions on inlining slab_alloc_node()? > (Looks like similar topic?) > > AFAIK slab_alloc_node() is inlined in: > kmem_cache_alloc() > kmem_cache_alloc_node() > kmem_cache_alloc_lru() > kmem_cache_alloc_trace() > kmem_cache_alloc_node_trace() > __kmem_cache_alloc_node() > > This is what I get after simply dropping __always_inline in slab_alloc_node: > > add/remove: 1/1 grow/shrink: 3/6 up/down: 1911/-5275 (-3364) > Function old new delta > slab_alloc_node - 1356 +1356 > sysfs_slab_alias 134 327 +193 > slab_memory_callback 528 717 +189 > __kmem_cache_create 1325 1498 +173 > __slab_alloc.constprop 135 - -135 > kmem_cache_alloc_trace 909 196 -713 > kmem_cache_alloc 937 191 -746 > kmem_cache_alloc_node_trace 1020 200 -820 > __kmem_cache_alloc_node 862 19 -843 > kmem_cache_alloc_node 1046 189 -857 > kmem_cache_alloc_lru 1348 187 -1161 > Total: Before=32011183, After=32007819, chg -0.01% > > So 3.28kB is cost of eliminating function call overhead in the > fastpath. > > This is tradeoff between function call overhead and > instruction cache usage... We can investigate this aftewards, with proper measurements etc. I think it's more sensitive than kmalloc_large_node.