Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal)

Florian Weimer <fweimer@xxxxxxxxxx> · Mon, 20 Jun 2022 14:37:51 +0200

* Ben Cotton:

> === Where’s the catch? ===
>
> The frame pointer register is not necessary to run a compiled binary.
> It makes it easy to unwind the stack, and some debugging tools rely on
> frame pointers, but the compiler knows how much data it put on the
> stack, so it can generate code that doesn't need the RBP. Not using
> the frame pointer register can make a program more efficient:
>
> * We don’t need to back up the value of the register onto the stack,
> which saves 3 instructions per function.
> * We can treat the RBP as a general-purpose register and use it for
> something else.
>
> Whether the compiler sets frame pointer or not is controlled by the
> -fomit-frame-pointer flag and the default is "omit", meaning we can’t
> use this method of stack unwinding by default.
>
> To make it possible to rely on the frame pointer being available,
> we'll add -fno-omit-frame-pointer to the default C/C++ compilation
> flags. This will instruct the compiler to make sure the frame pointer
> is always available. This will in turn allow profiling tools to
> provide accurate performance data which can drive performance
> improvements in core libraries and executables.

I don't think this paints an incomplete picture.  Many programs spend a
noticeable fraction of their time in the glibc string functions
(particularly memcpy and memset, maybe also memmove, strcpy, strlen,
memcmp, and strcmp).  These string functions are implemented in
hand-tuned assembler and do not set up a frame pointer.  I assume this
means that a backchain-based unwinder will pick %rbp in these functions
and use that to find the caller's frame and the address of its caller,
which is *not* the caller of the string function, but the next caller
after that.  This means that profiles generated this way will lack the
immediate callers of the string functions, which I expect will be rather
confusing.  Given how often string functions show up in profiles, I
think this is hardly acceptable.

I do not want to maintain a fork of glibc which adds frame pointers to
the string functions because there are so many variants of them (making
achieving decent test coverage difficult), and the upstream change rate
in this area is pretty high.  The risk of semantic (not textual) merge
conflicts is also high because we might not notice if an early return
instruction is introduced.

I really dislike this proposal and want to record my objection.
Instead, I recommend to use better profilers (or at least profilers less
political about DWARF) and CPUs with matching hardware support.

DWARF-based unwinding does not have to be extremely slow.  There is a
widespread belief that it has to be that way because of some magic DWARF
properties.  It's perhaps not as fast as it could be.  But repeating
this claim like a mantra merely dissuades people from looking at
performance improvements.  For example, we made a few simple changes in
glibc 2.35 and GCC 12 to make in-process unwinding efficient with many
shared objects and in multi-threaded processes.  I do wonder if we could
have arrived there many, many years ago if it weren't for the “DWARF is
slow” meme.  (And now that's done, there's other straightforward
implementation issues in the libgcc in-process unwinder that could be
improved.)

Thanks,
Florian
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure