On Wed, Apr 29, 2020 at 10:53:15PM -0500, Josh Poimboeuf wrote: > On Wed, Apr 29, 2020 at 07:10:52PM -0700, Alexei Starovoitov wrote: > > > For example: > > > > > > #define GOTO ({ goto *jumptable[insn->code]; }) > > > > > > and then replace all 'goto select_insn' with 'GOTO;' > > > > > > The problem is that with RETPOLINE=y, the function text size grows from > > > 5k to 7k, because for each of the 160+ retpoline JMPs, GCC (stupidly) > > > reloads the jump table register into a scratch register. > > > > that would be a tiny change, right? > > I'd rather go with that and gate it with 'ifdef CONFIG_FRAME_POINTER' > > Like: > > #ifndef CONFIG_FRAME_POINTER > > #define CONT ({ insn++; goto select_insn; }) > > #define CONT_JMP ({ insn++; goto select_insn; }) > > #else > > #define CONT ({ insn++; goto *jumptable[insn->code]; }) > > #define CONT_JMP ({ insn++; goto *jumptable[insn->code]; }) > > #endif > > > > The reason this CONT and CONT_JMP macros are there because a combination > > of different gcc versions together with different cpus make branch predictor > > behave 'unpredictably'. > > > > I've played with CONT and CONT_JMP either both doing direct goto or > > indirect goto and observed quite different performance characteristics > > from the interpreter. > > What you see right now was the best tune I could get from a set of cpus > > I had to play with and compilers. If I did the same tuning today the outcome > > could have been different. > > So I think it's totally fine to use above code. I think some cpus may actually > > see performance gains with certain versions of gcc. > > The retpoline text increase is unfortunate but when retpoline is on > > other security knobs should be on too. In particular CONFIG_BPF_JIT_ALWAYS_ON > > should be on as well. Which will remove interpreter from .text completely. > > This would actually be contingent on RETPOLINE, not FRAME_POINTER. > > (FRAME_POINTER was the other issue with the "optimize" attribute, which > we're reverting so it'll no longer be a problem.) > > So if you're not concerned about the retpoline text growth, it could be > as simple as: > > #define CONT ({ insn++; goto *jumptable[insn->code]; }) > #define CONT_JMP ({ insn++; goto *jumptable[insn->code]; }) > > > Or, if you wanted to avoid the text growth, it could be: > > #ifdef CONFIG_RETPOLINE I'm a bit lost. So objtool is fine with the asm when retpoline is on? Then pls do: #if defined(CONFIG_RETPOLINE) || !defined(CONFIG_X86) since there is no need to mess with other archs. > /* > * Avoid a 40% increase in function text size by getting GCC to generate a > * single retpoline jump instead of 160+. > */ > #define CONT ({ insn++; goto select_insn; }) > #define CONT_JMP ({ insn++; goto select_insn; }) > > select_insn: > goto *jumptable[insn->code]; > > #else /* !CONFIG_RETPOLINE */ > #define CONT ({ insn++; goto *jumptable[insn->code]; }) > #define CONT_JMP ({ insn++; goto *jumptable[insn->code]; }) > #endif /* CONFIG_RETPOLINE */ > > > But since this is legacy path, I think the first one is much nicer. > > > Also, JMP_TAIL_CALL has a "goto select_insn", is it ok to convert that > to CONT? yep