On Thu, Oct 29, 2020 at 12:10:52AM +0100, Ard Biesheuvel wrote: > On Wed, 28 Oct 2020 at 23:59, Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Wed, Oct 28, 2020 at 11:15:04PM +0100, Ard Biesheuvel wrote: > > > On Wed, 28 Oct 2020 at 22:39, Alexei Starovoitov > > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > > > On Wed, Oct 28, 2020 at 06:15:05PM +0100, Ard Biesheuvel wrote: > > > > > Commit 3193c0836 ("bpf: Disable GCC -fgcse optimization for > > > > > ___bpf_prog_run()") introduced a __no_fgcse macro that expands to a > > > > > function scope __attribute__((optimize("-fno-gcse"))), to disable a > > > > > GCC specific optimization that was causing trouble on x86 builds, and > > > > > was not expected to have any positive effect in the first place. > > > > > > > > > > However, as the GCC manual documents, __attribute__((optimize)) > > > > > is not for production use, and results in all other optimization > > > > > options to be forgotten for the function in question. This can > > > > > cause all kinds of trouble, but in one particular reported case, > > > > > it causes -fno-asynchronous-unwind-tables to be disregarded, > > > > > resulting in .eh_frame info to be emitted for the function. > > > > > > > > > > This reverts commit 3193c0836, and instead, it disables the -fgcse > > > > > optimization for the entire source file, but only when building for > > > > > X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the > > > > > original commit states that CONFIG_RETPOLINE=n triggers the issue, > > > > > whereas CONFIG_RETPOLINE=y performs better without the optimization, > > > > > so it is kept disabled in both cases. > > > > > > > > > > Fixes: 3193c0836 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()") > > > > > Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@xxxxxxxxxxxxxx/ > > > > > Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx> > > > > > --- > > > > > include/linux/compiler-gcc.h | 2 -- > > > > > include/linux/compiler_types.h | 4 ---- > > > > > kernel/bpf/Makefile | 6 +++++- > > > > > kernel/bpf/core.c | 2 +- > > > > > 4 files changed, 6 insertions(+), 8 deletions(-) > > > > > > > > > > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h > > > > > index d1e3c6896b71..5deb37024574 100644 > > > > > --- a/include/linux/compiler-gcc.h > > > > > +++ b/include/linux/compiler-gcc.h > > > > > @@ -175,5 +175,3 @@ > > > > > #else > > > > > #define __diag_GCC_8(s) > > > > > #endif > > > > > - > > > > > -#define __no_fgcse __attribute__((optimize("-fno-gcse"))) > > > > > > > > See my reply in the other thread. > > > > I prefer > > > > -#define __no_fgcse __attribute__((optimize("-fno-gcse"))) > > > > +#define __no_fgcse __attribute__((optimize("-fno-gcse,-fno-omit-frame-pointer"))) > > > > > > > > Potentially with -fno-asynchronous-unwind-tables. > > > > > > > > > > So how would that work? arm64 has the following: > > > > > > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables > > > > > > ifeq ($(CONFIG_SHADOW_CALL_STACK), y) > > > KBUILD_CFLAGS += -ffixed-x18 > > > endif > > > > > > and it adds -fpatchable-function-entry=2 for compilers that support > > > it, but only when CONFIG_FTRACE is enabled. > > > > I think you're assuming that GCC drops all flags when it sees __attribute__((optimize)). > > That's not the case. > > > > So which flags does it drop, and which doesn't it drop? Is that > documented somewhere? Is that the same for all versions of GCC? > > > > Also, as Nick pointed out, -fno-gcse does not work on Clang. > > > > yes and what's the point? > > #define __no_fgcse is GCC only. clang doesn't need this workaround. > > > > Ah ok, that's at least something. > > > > Every architecture will have a different set of requirements here. And > > > there is no way of knowing which -f options are disregarded when you > > > use the function attribute. > > > > > > So how on earth are you going to #define __no-fgcse correctly for > > > every configuration imaginable? > > > > > > > __attribute__((optimize("")) is not as broken as you're claiming to be. > > > > It has quirky gcc internal logic, but it's still widely used > > > > in many software projects. > > > > > > So it's fine because it is only a little bit broken? I'm sorry, but > > > that makes no sense whatsoever. > > > > > > If you insist on sticking with this broken construct, can you please > > > make it GCC/x86-only at least? > > > > I'm totally fine with making > > #define __no_fgcse __attribute__((optimize("-fno-gcse,-fno-omit-frame-pointer"))) > > to be gcc+x86 only. > > I'd like to get rid of it, but objtool is not smart enough to understand > > generated asm without it. > > I'll defer to the x86 folks to make the final call here, but I would > be perfectly happy doing > > index d1e3c6896b71..68ddb91fbcc6 100644 > --- a/include/linux/compiler-gcc.h > +++ b/include/linux/compiler-gcc.h > @@ -176,4 +176,6 @@ > #define __diag_GCC_8(s) > #endif > > +#ifdef CONFIG_X86 > #define __no_fgcse __attribute__((optimize("-fno-gcse"))) > +#endif If you're going to submit this patch could you please add ,-fno-omit-frame-pointer to the above as well? > and end the conversation here, because I honestly cannot wrap my head > around the fact that you are willing to work around an x86 specific > objtool shortcoming by arbitrarily disabling some GCC optimization for > all architectures, using a construct that may or may not affect other > compiler settings in unpredictable ways, where the compiler is being > used to compile a BPF language runtime for executing BPF programs > inside the kernel. > > What on earth could go wrong? Frankly I'm move worried that -Os will generate incorrect code. All compilers have bugs. Kernel has bugs. What can go wrong?