I made an effort to make the tracepoint code call functions instead of having everything inlined. It actually brought down the size of the text of the kernel, but looking in the change logs I never posted benchmarks. But I'm sure making the size of the scheduler text section smaller probably did help. > > That would be in line with my understanding above. Does the arm64 compiler > > not do it as well as x86 (could be maybe found out by disassembling) or the > > Pixel6 cpu somhow caches these out of line blocks more aggressively and only > > a function call stops it? > > I'll disassemble the code and will see what it looks like. I think I asked you to do that too ;-) > > > > > > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx> > > > > Kinda sad that despite the static key we have to control a lot by the > > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT in addition. > > I agree. If there is a better way to fix this regression I'm open to > changes. Let's wait for Steven to confirm my understanding before > proceeding. How slow is it to always do the call instead of inlining? -- Steve