On Wed, Apr 29, 2020 at 07:10:52PM -0700, Alexei Starovoitov wrote: > > For example: > > > > #define GOTO ({ goto *jumptable[insn->code]; }) > > > > and then replace all 'goto select_insn' with 'GOTO;' > > > > The problem is that with RETPOLINE=y, the function text size grows from > > 5k to 7k, because for each of the 160+ retpoline JMPs, GCC (stupidly) > > reloads the jump table register into a scratch register. > > that would be a tiny change, right? > I'd rather go with that and gate it with 'ifdef CONFIG_FRAME_POINTER' > Like: > #ifndef CONFIG_FRAME_POINTER > #define CONT ({ insn++; goto select_insn; }) > #define CONT_JMP ({ insn++; goto select_insn; }) > #else > #define CONT ({ insn++; goto *jumptable[insn->code]; }) > #define CONT_JMP ({ insn++; goto *jumptable[insn->code]; }) > #endif > > The reason this CONT and CONT_JMP macros are there because a combination > of different gcc versions together with different cpus make branch predictor > behave 'unpredictably'. > > I've played with CONT and CONT_JMP either both doing direct goto or > indirect goto and observed quite different performance characteristics > from the interpreter. > What you see right now was the best tune I could get from a set of cpus > I had to play with and compilers. If I did the same tuning today the outcome > could have been different. > So I think it's totally fine to use above code. I think some cpus may actually > see performance gains with certain versions of gcc. > The retpoline text increase is unfortunate but when retpoline is on > other security knobs should be on too. In particular CONFIG_BPF_JIT_ALWAYS_ON > should be on as well. Which will remove interpreter from .text completely. This would actually be contingent on RETPOLINE, not FRAME_POINTER. (FRAME_POINTER was the other issue with the "optimize" attribute, which we're reverting so it'll no longer be a problem.) So if you're not concerned about the retpoline text growth, it could be as simple as: #define CONT ({ insn++; goto *jumptable[insn->code]; }) #define CONT_JMP ({ insn++; goto *jumptable[insn->code]; }) Or, if you wanted to avoid the text growth, it could be: #ifdef CONFIG_RETPOLINE /* * Avoid a 40% increase in function text size by getting GCC to generate a * single retpoline jump instead of 160+. */ #define CONT ({ insn++; goto select_insn; }) #define CONT_JMP ({ insn++; goto select_insn; }) select_insn: goto *jumptable[insn->code]; #else /* !CONFIG_RETPOLINE */ #define CONT ({ insn++; goto *jumptable[insn->code]; }) #define CONT_JMP ({ insn++; goto *jumptable[insn->code]; }) #endif /* CONFIG_RETPOLINE */ But since this is legacy path, I think the first one is much nicer. Also, JMP_TAIL_CALL has a "goto select_insn", is it ok to convert that to CONT? -- Josh