clear_user (was: [patch 02/14] tmpfs: fix regressions from wider use of ZERO_PAGE)

Borislav Petkov <bp@xxxxxxxxx> · Tue, 10 May 2022 11:31:28 +0200

Lemme fix that subject so that I can find it easier in my avalanche mbox...

On Wed, May 04, 2022 at 02:09:52PM -0700, Linus Torvalds wrote:
> I don't tend to particularly care about "how many times has this been
> called" kind of trace profiles. It's the actual expense in CPU cycles
> I tend to care about.

Yeah, but, I wanted to measure how much perf improvement that would
bring with the git test suite and wanted to know how often clear_user()
is called in conjunction with it. Because the benchmarks I ran would
show very small improvements and a PF benchmark would even show weird
things like slowdowns with higher core counts.

So for a ~6m running test suite, the function gets called under 700K
times, all from padzero:

           <...>-2536    [006] .....   261.208801: padzero: to: 0x55b0663ed214, size: 3564, cycles: 21900
           <...>-2536    [006] .....   261.208819: padzero: to: 0x7f061adca078, size: 3976, cycles: 17160
           <...>-2537    [008] .....   261.211027: padzero: to: 0x5572d019e240, size: 3520, cycles: 23850
           <...>-2537    [008] .....   261.211049: padzero: to: 0x7f1288dc9078, size: 3976, cycles: 15900
	   ...

which is around 1%-ish of the total time and which is consistent with
the benchmark numbers.

So Mel gave me the idea to simply measure how fast the function becomes. I.e.:

                start = rdtsc_ordered();
                ret = __clear_user(to, n);
                end = rdtsc_ordered();

Computing the mean average of all the samples collected during the test
suite run then shows some improvement:

clear_user_original:
Amean: 9219.71 (Sum: 6340154910, samples: 687674)

fsrm:
Amean: 8030.63 (Sum: 5522277720, samples: 687652)

That's on Zen3.

I'll run this on Icelake now too.

> I haven't really done serious profiling work for a while (which is
> just as well, because it's one of the things that went backwards when
> I switch to the Zen 2 threadripper for my main machine)

Because of the not as advanced perf support there? Any pain points I can
forward?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette