Lemme fix that subject so that I can find it easier in my avalanche mbox... On Wed, May 04, 2022 at 02:09:52PM -0700, Linus Torvalds wrote: > I don't tend to particularly care about "how many times has this been > called" kind of trace profiles. It's the actual expense in CPU cycles > I tend to care about. Yeah, but, I wanted to measure how much perf improvement that would bring with the git test suite and wanted to know how often clear_user() is called in conjunction with it. Because the benchmarks I ran would show very small improvements and a PF benchmark would even show weird things like slowdowns with higher core counts. So for a ~6m running test suite, the function gets called under 700K times, all from padzero: <...>-2536 [006] ..... 261.208801: padzero: to: 0x55b0663ed214, size: 3564, cycles: 21900 <...>-2536 [006] ..... 261.208819: padzero: to: 0x7f061adca078, size: 3976, cycles: 17160 <...>-2537 [008] ..... 261.211027: padzero: to: 0x5572d019e240, size: 3520, cycles: 23850 <...>-2537 [008] ..... 261.211049: padzero: to: 0x7f1288dc9078, size: 3976, cycles: 15900 ... which is around 1%-ish of the total time and which is consistent with the benchmark numbers. So Mel gave me the idea to simply measure how fast the function becomes. I.e.: start = rdtsc_ordered(); ret = __clear_user(to, n); end = rdtsc_ordered(); Computing the mean average of all the samples collected during the test suite run then shows some improvement: clear_user_original: Amean: 9219.71 (Sum: 6340154910, samples: 687674) fsrm: Amean: 8030.63 (Sum: 5522277720, samples: 687652) That's on Zen3. I'll run this on Icelake now too. > I haven't really done serious profiling work for a while (which is > just as well, because it's one of the things that went backwards when > I switch to the Zen 2 threadripper for my main machine) Because of the not as advanced perf support there? Any pain points I can forward? Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette