Re: [PATCH v3 08/15] mm/slab_common: kmalloc_node: pass large requests to page allocator

Vlastimil Babka <vbabka@xxxxxxx> · Tue, 2 Aug 2022 11:32:41 +0200

On 8/2/22 10:59, Hyeonggon Yoo wrote:
> On Mon, Aug 01, 2022 at 04:44:22PM +0200, Vlastimil Babka wrote:
>> 
> 
> Yeah, uninlining __kmalloc_large_node saves hundreds of bytes.
> And the diff below looks good to me.
> 
> By The Way, do you have opinions on inlining slab_alloc_node()?
> (Looks like similar topic?)
> 
> AFAIK slab_alloc_node() is inlined in:
>         kmem_cache_alloc()
>         kmem_cache_alloc_node()
>         kmem_cache_alloc_lru()
>         kmem_cache_alloc_trace()
>         kmem_cache_alloc_node_trace()
>         __kmem_cache_alloc_node()
> 
> This is what I get after simply dropping __always_inline in slab_alloc_node:
> 
> add/remove: 1/1 grow/shrink: 3/6 up/down: 1911/-5275 (-3364)
> Function                                     old     new   delta
> slab_alloc_node                                -    1356   +1356
> sysfs_slab_alias                             134     327    +193
> slab_memory_callback                         528     717    +189
> __kmem_cache_create                         1325    1498    +173
> __slab_alloc.constprop                       135       -    -135
> kmem_cache_alloc_trace                       909     196    -713
> kmem_cache_alloc                             937     191    -746
> kmem_cache_alloc_node_trace                 1020     200    -820
> __kmem_cache_alloc_node                      862      19    -843
> kmem_cache_alloc_node                       1046     189    -857
> kmem_cache_alloc_lru                        1348     187   -1161
> Total: Before=32011183, After=32007819, chg -0.01%
> 
> So 3.28kB is cost of eliminating function call overhead in the 
> fastpath.
> 
> This is tradeoff between function call overhead and
> instruction cache usage...

We can investigate this aftewards, with proper measurements etc. I think
it's more sensitive than kmalloc_large_node.