* Ben Cotton: > === Where’s the catch? === > > The frame pointer register is not necessary to run a compiled binary. > It makes it easy to unwind the stack, and some debugging tools rely on > frame pointers, but the compiler knows how much data it put on the > stack, so it can generate code that doesn't need the RBP. Not using > the frame pointer register can make a program more efficient: > > * We don’t need to back up the value of the register onto the stack, > which saves 3 instructions per function. > * We can treat the RBP as a general-purpose register and use it for > something else. > > Whether the compiler sets frame pointer or not is controlled by the > -fomit-frame-pointer flag and the default is "omit", meaning we can’t > use this method of stack unwinding by default. > > To make it possible to rely on the frame pointer being available, > we'll add -fno-omit-frame-pointer to the default C/C++ compilation > flags. This will instruct the compiler to make sure the frame pointer > is always available. This will in turn allow profiling tools to > provide accurate performance data which can drive performance > improvements in core libraries and executables. I don't think this paints an incomplete picture. Many programs spend a noticeable fraction of their time in the glibc string functions (particularly memcpy and memset, maybe also memmove, strcpy, strlen, memcmp, and strcmp). These string functions are implemented in hand-tuned assembler and do not set up a frame pointer. I assume this means that a backchain-based unwinder will pick %rbp in these functions and use that to find the caller's frame and the address of its caller, which is *not* the caller of the string function, but the next caller after that. This means that profiles generated this way will lack the immediate callers of the string functions, which I expect will be rather confusing. Given how often string functions show up in profiles, I think this is hardly acceptable. I do not want to maintain a fork of glibc which adds frame pointers to the string functions because there are so many variants of them (making achieving decent test coverage difficult), and the upstream change rate in this area is pretty high. The risk of semantic (not textual) merge conflicts is also high because we might not notice if an early return instruction is introduced. I really dislike this proposal and want to record my objection. Instead, I recommend to use better profilers (or at least profilers less political about DWARF) and CPUs with matching hardware support. DWARF-based unwinding does not have to be extremely slow. There is a widespread belief that it has to be that way because of some magic DWARF properties. It's perhaps not as fast as it could be. But repeating this claim like a mantra merely dissuades people from looking at performance improvements. For example, we made a few simple changes in glibc 2.35 and GCC 12 to make in-process unwinding efficient with many shared objects and in multi-threaded processes. I do wonder if we could have arrived there many, many years ago if it weren't for the “DWARF is slow” meme. (And now that's done, there's other straightforward implementation issues in the libgcc in-process unwinder that could be improved.) Thanks, Florian _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure