On Sat, Jul 6, 2019 at 10:52 PM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: > > On Sat, Jul 06, 2019 at 08:32:06PM -0500, Josh Poimboeuf wrote: > > On Sat, Jul 06, 2019 at 10:29:42PM +0200, Ingo Molnar wrote: > > > Hm, I get this new build warning on x86-64 defconfig-ish kernels plus > > > these enabled: > > > > > > CONFIG_BPF=y > > > CONFIG_BPF_JIT=y > > > > > > kernel/bpf/core.o: warning: objtool: ___bpf_prog_run()+0x8da: sibling call from callable instruction with modified stack frame > > > > I assume you have CONFIG_RETPOLINE disabled? For some reason that > > causes GCC to add 166 indirect jumps to that function, which is giving > > objtool trouble. Looking into it. > > Alexei, do you have any objections to setting -fno-gcse for > ___bpf_prog_run()? Either for the function or the file? Doing so seems > to be recommended by the GCC manual for computed gotos. It would also > "fix" one of the issues. More details below. > > Details: > > With CONFIG_RETPOLINE=n, there are a couple of GCC optimizations in > ___bpf_prog_run() which objtool is having trouble with. > > 1) > > The function has: > > select_insn: > goto *jumptable[insn->code]; > > And then a bunch of "goto select_insn" statements. > > GCC is basically replacing > > select_insn: > jmp *jumptable(,%rax,8) > ... > ALU64_ADD_X: > ... > jmp select_insn > ALU_ADD_X: > ... > jmp select_insn > > with > > select_insn: > jmp *jumptable(, %rax, 8) > ... > ALU64_ADD_X: > ... > jmp *jumptable(, %rax, 8) > ALU_ADD_X: > ... > jmp *jumptable(, %rax, 8) > > > It does that 166 times. > > For some reason, it doesn't do the optimization with retpolines > enabled. > > Objtool has never seen multiple indirect jump sites which use the same > jump table. This is relatively trivial to fix (I already have a > working patch). > > 2) > > After doing the first optimization, GCC then does another one which is > a little trickier. It replaces: > > select_insn: > jmp *jumptable(, %rax, 8) > ... > ALU64_ADD_X: > ... > jmp *jumptable(, %rax, 8) > ALU_ADD_X: > ... > jmp *jumptable(, %rax, 8) > > with > > select_insn: > mov jumptable, %r12 > jmp *(%r12, %rax, 8) > ... > ALU64_ADD_X: > ... > jmp *(%r12, %rax, 8) > ALU_ADD_X: > ... > jmp *(%r12, %rax, 8) > > The problem is that it only moves the jumptable address into %r12 > once, for the entire function, then it goes through multiple recursive > indirect jumps which rely on that %r12 value. But objtool isn't yet > smart enough to be able to track the value across multiple recursive > indirect jumps through the jump table. > > After some digging I found that the quick and easy fix is to disable > -fgcse. In fact, this seems to be recommended by the GCC manual, for > code like this: > > -fgcse > Perform a global common subexpression elimination pass. This > pass also performs global constant and copy propagation. > > Note: When compiling a program using computed gotos, a GCC > extension, you may get better run-time performance if you > disable the global common subexpression elimination pass by > adding -fno-gcse to the command line. > > Enabled at levels -O2, -O3, -Os. > > This code indeed relies extensively on computed gotos. I don't know > *why* disabling this optimization would improve performance. In fact > I really don't see how it could make much of a difference either way. > > Anyway, using -fno-gcse makes optimization #2 go away and makes > objtool happy, with only a fix for #1 needed. > > If -fno-gcse isn't an option, we might be able to fix objtool by using > the "first_jump_src" thing which Peter added, improving it such that > it also takes table jumps into account. Sorry for delay. I'm mostly offgrid until next week. As far as -fno-gcse.. I don't mind as long as it doesn't hurt performance. Which I suspect it will :( All these indirect gotos are there for performance. Single indirect goto and a bunch of jmp select_insn are way slower, since there is only one instruction for cpu branch predictor to work with. When every insn is followed by "jmp *jumptable" there is more room for cpu to speculate. It's been long time, but when I wrote it the difference between all indirect goto vs single indirect goto was almost 2x.
![]() |