On Fri, Feb 2, 2018 at 6:59 AM, David Woodhouse <dwmw@xxxxxxxxxxxx> wrote: > With retpoline, tight loops of "call this function for every XXX" are > very much pessimised by taking a prediction miss *every* time. > > This one showed up very high in our early testing, and it only has five > things it'll ever call so make it take an 'op' enum instead of a > function pointer and let's see how that works out... Umm. May I suggest a different workaround? Honestly, if this is so performance-critical, the *real* fix is to actually just mark all those "slot_handle_*()" functions as "always_inline". Because that will *really* improve performance, by simply removing the indirection entirely - since then the functions involved will become static. You might get other code improvements too, because I suspect it will end up removing an extra level of function call due to those trivial wrapper functions. And there's a couple of "bool lock_flush_tlb" arguments that will simply become constant and generate much better code that way. And maybe you don't want to inline all of the slot_handle_*() functions, and it's only one or two of them that matter because they loop over a lot of entries, but honestly, most of those slot_handle_xyz() functions seem to have just a couple of call sites anyway. slot_handle_large_level() is probably already inlined by the compiler because it only has a single call-site. Will it make for bigger code? Yes. But probably not really all *that* much bigger, because of how it also will allow the compiler to simplify some things. An dif this really is so critical that those non-predicted calls were that noticeable, those other simplifications probably also matter. And then you get rid of all run-time conditionals, and all the indirect jumps entirely. Plus the patch will be smaller and simpler too. Hmm? Linus