Re: [PATCH 2/3] alloc_tag: uninline code gated by mem_alloc_profiling_key in slab allocator

Vlastimil Babka <vbabka@xxxxxxx> · Wed, 29 Jan 2025 10:50:43 +0100

On 1/29/25 01:03, Steven Rostedt wrote:
> On Tue, 28 Jan 2025 15:43:13 -0800
> Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> 
>> > How slow is it to always do the call instead of inlining?  
>> 
>> Let's see... The additional overhead if we always call is:
>> 
>> Little core: 2.42%
>> Middle core: 1.23%
>> Big core: 0.66%
>> 
>> Not a huge deal because the overhead of memory profiling when enabled
>> is much higher. So, maybe for simplicity I should indeed always call?
> 
> That's what I was thinking, unless the other maintainers are OK with this
> special logic.

If it's acceptable, I would prefer to always call. But at the same time make
sure the static key test is really inlined, i.e. force inline
alloc_tagging_slab_alloc_hook() (see my other reply looking at the disassembly).

Well or rather just open-code the contents of the
alloc_tagging_slab_alloc_hook and alloc_tagging_slab_free_hook (as they look
after this patch) into the callers. It's just two lines. The extra layer is
just unnecessary distraction.

Then it's probably inevitable the actual hook content after the static key
test should not be inline even with
CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT as the result would be inlined
into too many places. But since we remove one call layer anyway thanks to
above, even without the full inlining the resulting performance could
hopefully be fine (compared to the state before your series).

> -- Steve