On Thu, Feb 15, 2024 at 07:21:41PM -0500, Steven Rostedt wrote: > On Thu, 15 Feb 2024 18:51:41 -0500 > Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote: > > > Most of that is data (505024), not text (68582, or 66k). > > > > And the 4K extra would have been data too. "It's not that much" isn't an argument for being wasteful. > > The data is mostly the alloc tags themselves (one per allocation > > callsite, and you compiled the entire kernel), so that's expected. > > > > Of the text, a lot of that is going to be slowpath stuff - module load > > and unload hooks, formatt and printing the output, other assorted bits. > > > > Then there's Allocation and deallocating obj extensions vectors - not > > slowpath but not super fast path, not every allocation. > > > > The fastpath instruction count overhead is pretty small > > - actually doing the accounting - the core of slub.c, page_alloc.c, > > percpu.c > > - setting/restoring the alloc tag: this is overhead we add to every > > allocation callsite, so it's the most relevant - but it's just a few > > instructions. > > > > So that's the breakdown. Definitely not zero overhead, but that fixed > > memory overhead (and additionally, the percpu counters) is the price we > > pay for very low runtime CPU overhead. > > But where are the benchmarks that are not micro-benchmarks. How much > overhead does this cause to those? Is it in the noise, or is it noticeable? Microbenchmarks are how we magnify the effect of a change like this to the most we'll ever see. Barring cache effects, it'll be in the noise. Cache effects are a concern here because we're now touching task_struct in the allocation fast path; that is where the "compiled-in-but-turned-off" overhead comes from, because we can't add static keys for that code without doubling the amount of icache footprint, and I don't think that would be a great tradeoff. So: if your code has fastpath allocations where the hot part of task_struct isn't in cache, then this will be noticeable overhead to you, otherwise it won't be.