Richard W.M. Jones wrote: > The problem is you're confusing general gains and gains in > specific scenarios. But the thing is that a gain in some specific scenario is a lot less useful than a general gain. And the latter is usually not had through profiling, but through improvements in toolchain optimizations. -fomit-frame-pointer was one such improvement that you have now successfully destroyed for all Fedora users. > Perf + flamegraphs are such a useful tool that we managed to double > performance (ie. ~ 100% gain) in one particular network server case > that we investigated a few years ago. This was by spotting that the > kernel was writing to an MSR (hardware register) which was really > slow, and as it wasn't necessary we just got rid of it. > > For that one use case - an incredible performance gain! Does this > mean everyone sees their machines double in speed? Of course not. And that is why that improvement is much less impressive than it sounds at first. Chances are it helps only a handful users, in a handful situations, and even for those users, the overall improvement is not going to be 100% because they will also be using other software than the one you profiled and optimized. > Will we be able to say that "Fedora got N% faster" in two years? > Not at all - it depends entirely what you use Fedora for. Hence this makes the claims made by the change proponents entirely unrealistic and impossible to ever verify. We are hitting the end users with an overall performance penalty in exchange of potential performance improvements that are impossible not only to predict, but even to quantify after the fact, i.e., the claim that the latter will more than compensate for the former is completely unsubstantiated. > The overhead is also a real thing. There's a few percent overhead > everywhere for enabling frame pointers because every stack frame entry > and exit involves a couple of extra instructions. Exactly. > Anyway I'd really urge you to play with these tools before judging > this proposal: https://www.brendangregg.com/flamegraphs.html KCachegrind, using Valgrind with the Callgrind or Cachegrind tool, gives me more information than that even without frame pointers, and it is actually reliable because it dynamically instruments the code and traces every single instruction instead of just taking random samples and hoping it did not miss anything important. It is also much more reproducible because it uses a mathematical model for the CPU cycles instead of a wallclock time sample that depends not only on your particular CPU, but also on things such as background tasks, thermal throttling, etc. Yes, it is slower (up to a factor ~50), but only for the developer doing the profiling, and as explained above, the reported cycle counts do not depend on the wallclock time anyway. Kevin Kofler _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue