Re: [patch 02/14] tmpfs: fix regressions from wider use of ZERO_PAGE

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 4 May 2022 14:09:52 -0700

On Wed, May 4, 2022 at 2:01 PM Borislav Petkov <bp@xxxxxxxxx> wrote:
>
> I could try to do a perf probe or whatever fancy new thing we do now on
> clear_user to get some numbers of how many times it gets called during
> the benchmark run. Or do you wanna know the callers too?

One of the non-performance reasons I like inlined memcpy is actually
that when you do a regular 'perf record' run, the cost of the memcpy
gets associated with the call-site.

Which is universally what I want for those things. I used to love our
inlined spinlocks for the same reason back when we did them.

Yeah, yeah, you can do it with callchain magic, but then you get it
all - and I really consider memcpy/memset to be a special case.
Normally I want the "oh, that leaf function is expensive", but not for
memcpy and memset (and not for spinlocks, but we'll never go back to
the old trivial spinlocks)

I don't tend to particularly care about "how many times has this been
called" kind of trace profiles. It's the actual expense in CPU cycles
I tend to care about.

That said, I cared deeply about those kinds of CPU profiles when I was
working with Al on the RCU path lookup code and looking for where the
problem spots were.

That was years ago.

I haven't really done serious profiling work for a while (which is
just as well, because it's one of the things that went backwards when
I switch to the Zen 2 threadripper for my main machine)

                  Linus