On Mon, Jul 08, 2019 at 05:53:59PM -0500, Josh Poimboeuf wrote: > On Mon, Jul 08, 2019 at 03:49:33PM -0700, Alexei Starovoitov wrote: > > > > Sorry for delay. I'm mostly offgrid until next week. > > > > As far as -fno-gcse.. I don't mind as long as it doesn't hurt performance. > > > > Which I suspect it will :( > > > > All these indirect gotos are there for performance. > > > > Single indirect goto and a bunch of jmp select_insn > > > > are way slower, since there is only one instruction > > > > for cpu branch predictor to work with. > > > > When every insn is followed by "jmp *jumptable" > > > > there is more room for cpu to speculate. > > > > It's been long time, but when I wrote it the difference > > > > between all indirect goto vs single indirect goto was almost 2x. > > > > > > Just to clarify, -fno-gcse doesn't get rid of any of the indirect jumps. > > > It still has 166 indirect jumps. It just gets rid of the second > > > optimization, where the jumptable address is placed in a register. > > > > what about other functions in core.c ? > > May be it's easier to teach objtool to recognize that pattern? > > The GCC man page actually recommends using -fno-gcse for computed goto > code, for better performance. So if that's actually true, then it would > be win-win because objtool wouldn't need a change for it. > > Otherwise I can teach objtool to recognize the new pattern. > > > > If you have a benchmark which is relatively easy to use, I could try to > > > run some tests. > > > > modprobe test_bpf > > selftests/bpf/test_progs > > both print runtime. > > Some of test_progs have high run-to-run variations though. > > Thanks, I'll give it a shot. I modprobed test_bpf with JIT disabled. Before: 2.493018s After: 2.523572s So it looks like it's either no change, or slightly slower. I'll just teach objtool to recognize the optimization. -- Josh
![]() |