Hi, On June 16, 2022 8:53:59 PM UTC, Ben Cotton <bcotton@xxxxxxxxxx> wrote: >https://fedoraproject.org/wiki/Changes/fno-omit-frame-pointer > >This document represents a proposed Change. As part of the Changes >process, proposals are publicly announced in order to receive >community feedback. This proposal will only be implemented if approved >by the Fedora Engineering Steering Committee. > >== Summary == > >Fedora will add -fno-omit-frame-pointer to the default C/C++ >compilation flags, which will improve the effectiveness of profiling >and debugging tools. > >== Owner == >* Name: [[User:daandemeyer| Daan De Meyer]], [[User:Dcavalca| Davide >Cavalca]], [[ Andrii Nakryiko]] >* Email: daandemeyer@xxxxxx, dcavalca@xxxxxx, andriin@xxxxxx > > >== Detailed Description == > >Credits to Mirek Klimos, whose internal note on stacktrace unwinding >formed the basis for this change proposal (myreggg@xxxxxxxxx). > >Any performance or efficiency work relies on accurate profiling data. >Sampling profilers probe the target program's call stack at regular >intervals and store the stack traces. If we collect enough of them, we >can closely approximate the real cost of a library or function with >minimal runtime overhead. > >Stack trace capture what’s running on a thread. It should start with >clone - if the thread was created via clone syscall - or with _start - >if it’s the main thread of the process. The last function in the stack >trace is code that CPU is currently executing. If a stack starts with >[unknown] or any other symbol, it means it's not complete. > >=== Unwinding === > >How does the profiler get the list of function names? There are two parts of it: > ># Unwinding the stack - getting a list of virtual addresses pointing >to the executable code ># Symbolization - translating virtual addresses into human-readable >information, like function name, inlined functions at the address, or >file name and line number. > >Unwinding is what we're interested in for the purpose of this >proposal. The important things are: > >* Data on stack is split into frames, each frame belonging to one function. >* Right before each function call, the return address is put on the >stack. This is the instruction address in the caller to which we will >eventually return — and that's what we care about. >* One register, called the "frame pointer" or "base pointer" register >(RBP), is traditionally used to point to the beginning of the current >frame. Every function should back up RBP onto the stack and set it >properly at the very beginning. > >The “frame pointer” part is achieved by adding push %rbp, mov >%rsp,%rbp to the beginning of every function and by adding pop %rbp >before returning. Using this knowledge, stack unwinding boils down to >traversing a linked list: > >https://i.imgur.com/P6pFdPD.png As you specifically use x86_64 assembly as an example here: have you looked on the impact this will have on other architectures like arm or riscv? Cheers, Dan > >=== Where’s the catch? === > >The frame pointer register is not necessary to run a compiled binary. >It makes it easy to unwind the stack, and some debugging tools rely on >frame pointers, but the compiler knows how much data it put on the >stack, so it can generate code that doesn't need the RBP. Not using >the frame pointer register can make a program more efficient: > >* We don’t need to back up the value of the register onto the stack, >which saves 3 instructions per function. >* We can treat the RBP as a general-purpose register and use it for >something else. > >Whether the compiler sets frame pointer or not is controlled by the >-fomit-frame-pointer flag and the default is "omit", meaning we can’t >use this method of stack unwinding by default. > >To make it possible to rely on the frame pointer being available, >we'll add -fno-omit-frame-pointer to the default C/C++ compilation >flags. This will instruct the compiler to make sure the frame pointer >is always available. This will in turn allow profiling tools to >provide accurate performance data which can drive performance >improvements in core libraries and executables. > >== Feedback == > >=== Potential performance impact === > >* Meta builds all its libraries and executables with >-fno-omit-frame-pointer by default. Internal benchmarks did not show >significant impact on performance when omitting the frame pointer for >two of our most performance intensive applications. >* Firefox recently landed a change to preserve the frame pointer in >all jitted code >(https://bugzilla.mozilla.org/show_bug.cgi?id=1426134). No significant >decrease in performance was observed. >* Kernel 4.8 frame pointer benchmarks by Suse showed 5%-10% >regressions in some benchmarks >(https://lore.kernel.org/all/20170602104048.jkkzssljsompjdwy@xxxxxxx/T/#u) > >Should individual libraries or executables notice a significant >performance degradation caused by including the frame pointer >everywhere, these packages can opt-out on an individual basis as >described in https://docs.fedoraproject.org/en-US/packaging-guidelines/#_compiler_flags. > >=== Alternatives to frame pointers === > >There are a few alternative ways to unwind stacks instead of using the >frame pointer: > >* [https://dwarfstd.org DWARF] data - The compiler can emit extra >information that allows us to find the beginning of the frame without >the frame pointer, which means we can walk the stack exactly as >before. The problem is that we need to unwind the stack in kernel >space which isn't implemented in the kernel. Given that the kernel >implemented it's own format (ORC) instead of using DWARF, it's >unlikely that we'll see a DWARF unwinder in the kernel any time soon. >The perf tool allows you to use the DWARF data with >--call-graph=dwarf, but this means that it copies the full stack on >every event and unwinds in user space. This has very high overhead. >* [https://www.kernel.org/doc/html/v5.3/x86/orc-unwinder.html ORC] >(undwarf) - problems with unwinding in kernel led to creation of >another format with the same purpose as DWARF, just much simpler. This >can only be used to unwind kernel stack traces; it doesn't help us >with userspace stacks. More information on ORC can be found >[https://lwn.net/Articles/728339 here]. >* [https://lwn.net/Articles/680985 LBR] - New Intel CPUs have a >feature that gives you source and target addresses for the last 16 (or >32, in newer CPUs) branches with no overhead. It can be configured to >record only function calls and to be used as a stack, which means it >can be used to get the stack trace. Sadly, you only get the last X >calls, and not the full stack trace, so the data can be very >incomplete. On top of that, many Fedora users might still be using >CPUs without LBR support which means we wouldn't be able to assume >working profilers on a Fedora system by default. > >To summarize, if we want complete stacks with reasonably low overhead >(which we do, there's no other way to get accurate profiling data from >running services), frame pointers are currently the best option. > >== Benefit to Fedora == > >Implementing this change will provide profiling tools with easy access >to stacktraces of installed libraries and executables which will lead >to more accurate profiling data in general. This in turn can be used >to implement optimizations to core libraries and executables which >will improve the overall performance of Fedora itself and the wider >Linux ecosystem. > >Various debugging tools can also make use of the frame pointer to >access the current stacktrace, although tools like gdb can already do >this to some degree via embedded dwarf debugging info. > >== Scope == >* Proposal owners: Put up a PR to change the rpm macros to build >packages by default with -fno-omit-frame-pointer by default. > >* Other developers: Review and merge the PR implementing the Change. > >* Release engineering: [https://pagure.io/releng/issues #Releng issue >number]. A mass rebuild is required. > >* Policies and guidelines: N/A (not needed for this Change) > >* Trademark approval: N/A (not needed for this Change) > >* Alignment with Objectives: N/A > >== Upgrade/compatibility impact == > >This should not impact upgrades in any way. > >== How To Test == > ># Build the package with the updated rpm macros ># Profile the binary with `perf record -g <binary>` ># Inspect the perf data with `perf report -g 'graph,0.5,caller'` ># When expanding hot functions in the perf report, perf should show >the full call graph of the hot function (at least for all functions >that are part of the binary compiled with -fno-omit-frame-pointer) > >== User Experience == > >Fedora users will be more likely to have a streamlined experience when >trying to debug/profile system executables/libraries. Tools such as >perf will work out of the box instead of requiring to users to provide >extra options (e.g. --call-graph=dwarf/LBR) or requiring users to >recompile all relevant packages with -fno-omit-frame-pointer. > >== Dependencies == > >The rpm macros for Fedora need to be adjusted to include >-fno-omit-frame-pointer in the default C/C++ compilation flags. > >== Contingency Plan == > >* Contingency mechanism: The new version can be released without every >package being rebuilt with fno-omit-frame-pointer. Profiling will only >work perfectly once all packages have been rebuilt but there will be >no regression in behavior if not all packages have been rebuilt by the >time of the release. If the Change is found to introduce unacceptable >regressions, the PR implementing it can be reverted and affected >packages can be rebuilt. >* Contingency deadline: Final freeze >* Blocks release? No > >== Documentation == > >* Original proposal for in-kernel DWARF unwinder (rejected): >https://lkml.org/lkml/2017/5/5/571 > >== Release Notes == > >Packages are now compiled with frame pointers included by default. >This will enable a variety of profiling and debugging tools to show >more information out of the box. > > _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure