This patch-set deals with an interesting yet stupid problem: code that does not get inlined despite its simplicity. I find 5 classes of causes: 1. Inline assembly blocks in which code and data are added to alternative sections. The compiler is oblivious to the content of the blocks and assumes their cost in space and time is proportional to the number of the perceived assembly "instruction", according to the number of newlines and semicolons. Alternatives, paravirt and other mechanisms are affected. 2. Inline assembly with redundant new-lines and semicolons. Similarly to (1) this code is considered "heavier" than it actually is. 3. Code with constant value optimizations. Quite a few parts of the kernel check whether a variable is constant (using __builtin_constant_p()) and perform heavy computations in that case. These computations are eventually optimized out so they do not land in the binary. However, the cost of these computations is also associated with the calling function, which might prevent inlining of the calling function. ilog2() is an example for such case. 4. Code that is marked with the "cold" attribute, including all the __init functions. Some may consider it the desired behavior. 5. Code that is marked with a different optimization levels. This affects for example vmx_vcpu_run(), inducing overheads of up to 10% on exit. This patch-set deals with some instances of first 3 classes. For (1) we insert an assembly macro, and call it from the inline assembly block. As a result, the compiler sees a single "instruction" and assigns the more appropriate cost to the code. For (2) the solution is trivial: just remove the newlines. (3) is somewhat tricky. The proposed solution is to use __builtin_choose_expr() to check whether a variable is actually constant instead of using an if-condition or the C ternary operator. __builtin_choose_expr() is evaluated earlier in the compilation, so it allows the compiler to associate the right cost for the variable case before the inlining decisions take place. So far so good. Still, there is a drawback. Since __builtin_choose_expr() is evaluated earlier, it can fail to recognize constants, which an if-condition would recognize correctly. As a result, this patch-set only applies it to the simplest cases. Overall this patch-set slightly increases the kernel size (my build was done using localmodconfig + localyesconfig for the record): text data bss dec hex filename 18126699 10066728 2936832 31130259 1db0293 ./vmlinux before 18149210 10064048 2936832 31150090 1db500a ./vmlinux after (+0.06%) The patch-set eliminates many of the static text symbols: Before: 40033 After: 39632 (-10%) There is a measurable effect on performance in some cases. A loop of MADV_DONTNEED/page-fault shows a 2% performance improvement with this patch-set. Some inline comments or self-explaining C macros might still be needed. [1] https://lkml.org/lkml/2018/5/5/159 Cc: Alok Kataria <akataria@xxxxxxxxxx> Cc: Christopher Li <sparse@xxxxxxxxxxx> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: Jan Beulich <JBeulich@xxxxxxxx> Cc: Jonathan Corbet <corbet@xxxxxxx> Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx> Cc: Juergen Gross <jgross@xxxxxxxx> Cc: Kees Cook <keescook@xxxxxxxxxxxx> Cc: linux-sparse@xxxxxxxxxxxxxxx Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx Cc: x86@xxxxxxxxxx Nadav Amit (8): x86: objtool: use asm macro for better compiler decisions x86: bug: prevent gcc distortions x86: alternative: macrofy locks for better inlining x86: prevent inline distortion by paravirt ops x86: refcount: prevent gcc distortions x86: removing unneeded new-lines ilog2: preventing compiler distortion due to big condition bitops: prevent compiler inline decision distortion arch/x86/include/asm/alternative.h | 28 ++++++++++---- arch/x86/include/asm/asm.h | 4 +- arch/x86/include/asm/bitops.h | 8 ++-- arch/x86/include/asm/bug.h | 48 ++++++++++++++--------- arch/x86/include/asm/cmpxchg.h | 10 ++--- arch/x86/include/asm/paravirt_types.h | 53 +++++++++++++++----------- arch/x86/include/asm/refcount.h | 55 ++++++++++++++++----------- arch/x86/include/asm/special_insns.h | 12 +++--- include/linux/compiler.h | 29 ++++++++++---- include/linux/log2.h | 11 +++--- 10 files changed, 156 insertions(+), 102 deletions(-) -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-sparse" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html