On Wed, May 4, 2022 at 2:01 PM Borislav Petkov <bp@xxxxxxxxx> wrote: > > I could try to do a perf probe or whatever fancy new thing we do now on > clear_user to get some numbers of how many times it gets called during > the benchmark run. Or do you wanna know the callers too? One of the non-performance reasons I like inlined memcpy is actually that when you do a regular 'perf record' run, the cost of the memcpy gets associated with the call-site. Which is universally what I want for those things. I used to love our inlined spinlocks for the same reason back when we did them. Yeah, yeah, you can do it with callchain magic, but then you get it all - and I really consider memcpy/memset to be a special case. Normally I want the "oh, that leaf function is expensive", but not for memcpy and memset (and not for spinlocks, but we'll never go back to the old trivial spinlocks) I don't tend to particularly care about "how many times has this been called" kind of trace profiles. It's the actual expense in CPU cycles I tend to care about. That said, I cared deeply about those kinds of CPU profiles when I was working with Al on the RCU path lookup code and looking for where the problem spots were. That was years ago. I haven't really done serious profiling work for a while (which is just as well, because it's one of the things that went backwards when I switch to the Zen 2 threadripper for my main machine) Linus