Re: Schedule for Tuesday's FESCo Meeting (2023-01-03)

Kevin Kofler via devel <devel@xxxxxxxxxxxxxxxxxxxxxxx> · Sat, 07 Jan 2023 07:24:30 +0100

Richard W.M. Jones wrote:
> The problem is you're confusing general gains and gains in
> specific scenarios.

But the thing is that a gain in some specific scenario is a lot less useful 
than a general gain. And the latter is usually not had through profiling, 
but through improvements in toolchain optimizations. -fomit-frame-pointer 
was one such improvement that you have now successfully destroyed for all 
Fedora users.

> Perf + flamegraphs are such a useful tool that we managed to double
> performance (ie. ~ 100% gain) in one particular network server case
> that we investigated a few years ago.  This was by spotting that the
> kernel was writing to an MSR (hardware register) which was really
> slow, and as it wasn't necessary we just got rid of it.
> 
> For that one use case - an incredible performance gain!  Does this
> mean everyone sees their machines double in speed?  Of course not.

And that is why that improvement is much less impressive than it sounds at 
first. Chances are it helps only a handful users, in a handful situations, 
and even for those users, the overall improvement is not going to be 100% 
because they will also be using other software than the one you profiled and 
optimized.

> Will we be able to say that "Fedora got N% faster" in two years?
> Not at all - it depends entirely what you use Fedora for.

Hence this makes the claims made by the change proponents entirely 
unrealistic and impossible to ever verify. We are hitting the end users with 
an overall performance penalty in exchange of potential performance 
improvements that are impossible not only to predict, but even to quantify 
after the fact, i.e., the claim that the latter will more than compensate 
for the former is completely unsubstantiated.

> The overhead is also a real thing.  There's a few percent overhead
> everywhere for enabling frame pointers because every stack frame entry
> and exit involves a couple of extra instructions.

Exactly.

> Anyway I'd really urge you to play with these tools before judging
> this proposal: https://www.brendangregg.com/flamegraphs.html

KCachegrind, using Valgrind with the Callgrind or Cachegrind tool, gives me 
more information than that even without frame pointers, and it is actually 
reliable because it dynamically instruments the code and traces every single 
instruction instead of just taking random samples and hoping it did not miss 
anything important. It is also much more reproducible because it uses a 
mathematical model for the CPU cycles instead of a wallclock time sample 
that depends not only on your particular CPU, but also on things such as 
background tasks, thermal throttling, etc. Yes, it is slower (up to a factor 
~50), but only for the developer doing the profiling, and as explained 
above, the reported cycle counts do not depend on the wallclock time anyway.

        Kevin Kofler
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue