Hi people, this is an attempt to see whether gcc's inline asm heuristic when estimating inline asm statements' cost for better inlining can be improved. AFAIU, the problematic arises when one ends up using a lot of inline asm statements in the kernel but due to the inline asm cost estimation heuristic which counts lines, I think, for example like in this here macro: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/include/asm/cpufeature.h#n162 the resulting code ends up not inlining the functions themselves which use this macro. I.e., you see a CALL <function> instead of its body getting inlined directly. Even though it should be because the actual instructions are only a couple in most cases and all those other directives end up in another section anyway. The issue is explained below in the forwarded mail in a larger detail too. Now, Richard suggested doing something like: 1) inline asm ("...") 2) asm ("..." : : : : <size-expr>) 3) asm ("...") __attribute__((asm_size(<size-expr>))); with which user can tell gcc what the size of that inline asm statement is and thus allow for more precise cost estimation and in the end better inlining. And FWIW 3) looks pretty straight-forward to me because attributes are pretty common anyways. But I'm sure there are other options and I'm sure people will have better/different ideas so feel free to chime in. Thx. On Wed, Oct 03, 2018 at 02:30:50PM -0700, Nadav Amit wrote: > This patch-set deals with an interesting yet stupid problem: kernel code > that does not get inlined despite its simplicity. There are several > causes for this behavior: "cold" attribute on __init, different function > optimization levels; conditional constant computations based on > __builtin_constant_p(); and finally large inline assembly blocks. > > This patch-set deals with the inline assembly problem. I separated these > patches from the others (that were sent in the RFC) for easier > inclusion. I also separated the removal of unnecessary new-lines which > would be sent separately. > > The problem with inline assembly is that inline assembly is often used > by the kernel for things that are other than code - for example, > assembly directives and data. GCC however is oblivious to the content of > the blocks and assumes their cost in space and time is proportional to > the number of the perceived assembly "instruction", according to the > number of newlines and semicolons. Alternatives, paravirt and other > mechanisms are affected, causing code not to be inlined, and degrading > compilation quality in general. > > The solution that this patch-set carries for this problem is to create > an assembly macro, and then call it from the inline assembly block. As > a result, the compiler sees a single "instruction" and assigns the more > appropriate cost to the code. > > To avoid uglification of the code, as many noted, the macros are first > precompiled into an assembly file, which is later assembled together > with the C files. This also enables to avoid duplicate implementation > that was set before for the asm and C code. This can be seen in the > exception table changes. > > Overall this patch-set slightly increases the kernel size (my build was > done using my Ubuntu 18.04 config + localyesconfig for the record): > > text data bss dec hex filename > 18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before > 18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+0.1%) > > The number of static functions in the image is reduced by 379, but > actually inlining is even better, which does not always shows in these > numbers: a function may be inlined causing the calling function not to > be inlined. > > I ran some limited number of benchmarks, and in general the performance > impact is not very notable. You can still see >10 cycles shaved off some > syscalls that manipulate page-tables (e.g., mprotect()), in which > paravirt caused many functions not to be inlined. In addition this > patch-set can prevent issues such as [1], and improves code readability > and maintainability. > > Update: Rasmus recently caused me (inadvertently) to become paranoid > about the dependencies. To clarify: if any of the headers changes, any c > file which uses macros that are included in macros.S would be fine as > long as it includes the header as well (as it should). Adding an > assertion to check this is done might become slightly ugly, and nobody > else is concerned about it. Another minor issue is that changes of > macros.S would not trigger a global rebuild, but that is pretty similar > to changes of the Makefile that do not trigger a rebuild. > > [1] https://patchwork.kernel.org/patch/10450037/ > > v8->v9: * Restoring the '-pipe' parameter (Rasmus) > * Adding Kees's tested-by tag (Kees) > > v7->v8: * Add acks (Masahiro, Max) > * Rebase on 4.19 (Ingo) > > v6->v7: * Fix context switch tracking (Ingo) > * Fix xtensa build error (Ingo) > * Rebase on 4.18-rc8 > > v5->v6: * Removing more code from jump-labels (PeterZ) > * Fix build issue on i386 (0-day, PeterZ) > > v4->v5: * Makefile fixes (Masahiro, Sam) > > v3->v4: * Changed naming of macros in 2 patches (PeterZ) > * Minor cleanup of the paravirt patch > > v2->v3: * Several build issues resolved (0-day) > * Wrong comments fix (Josh) > * Change asm vs C order in refcount (Kees) > > v1->v2: * Compiling the macros into a separate .s file, improving > readability (Linus) > * Improving assembly formatting, applying most of the comments > according to my judgment (Jan) > * Adding exception-table, cpufeature and jump-labels > * Removing new-line cleanup; to be submitted separately > > Cc: Masahiro Yamada <yamada.masahiro@xxxxxxxxxxxxx> > Cc: Sam Ravnborg <sam@xxxxxxxxxxxx> > Cc: Alok Kataria <akataria@xxxxxxxxxx> > Cc: Christopher Li <sparse@xxxxxxxxxxx> > Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: Jan Beulich <JBeulich@xxxxxxxx> > Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx> > Cc: Juergen Gross <jgross@xxxxxxxx> > Cc: Kate Stewart <kstewart@xxxxxxxxxxxxxxxxxxx> > Cc: Kees Cook <keescook@xxxxxxxxxxxx> > Cc: linux-sparse@xxxxxxxxxxxxxxx > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Cc: Philippe Ombredanne <pombredanne@xxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Cc: x86@xxxxxxxxxx > Cc: Chris Zankel <chris@xxxxxxxxxx> > Cc: Max Filippov <jcmvbkbc@xxxxxxxxx> > Cc: linux-xtensa@xxxxxxxxxxxxxxxx > > Nadav Amit (10): > xtensa: defining LINKER_SCRIPT for the linker script > Makefile: Prepare for using macros for inline asm > x86: objtool: use asm macro for better compiler decisions > x86: refcount: prevent gcc distortions > x86: alternatives: macrofy locks for better inlining > x86: bug: prevent gcc distortions > x86: prevent inline distortion by paravirt ops > x86: extable: use macros instead of inline assembly > x86: cpufeature: use macros instead of inline assembly > x86: jump-labels: use macros instead of inline assembly > > Makefile | 9 ++- > arch/x86/Makefile | 7 ++ > arch/x86/entry/calling.h | 2 +- > arch/x86/include/asm/alternative-asm.h | 20 ++++-- > arch/x86/include/asm/alternative.h | 11 +-- > arch/x86/include/asm/asm.h | 61 +++++++--------- > arch/x86/include/asm/bug.h | 98 +++++++++++++++----------- > arch/x86/include/asm/cpufeature.h | 82 ++++++++++++--------- > arch/x86/include/asm/jump_label.h | 77 ++++++++------------ > arch/x86/include/asm/paravirt_types.h | 56 +++++++-------- > arch/x86/include/asm/refcount.h | 74 +++++++++++-------- > arch/x86/kernel/macros.S | 16 +++++ > arch/xtensa/kernel/Makefile | 4 +- > include/asm-generic/bug.h | 8 +-- > include/linux/compiler.h | 56 +++++++++++---- > scripts/Kbuild.include | 4 +- > scripts/mod/Makefile | 2 + > 17 files changed, 331 insertions(+), 256 deletions(-) > create mode 100644 arch/x86/kernel/macros.S > > -- > 2.17.1 > -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.