On 30/09/2021 02.18, Nick Desaulniers wrote: > On Wed, Sep 29, 2021 at 4:28 PM Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: >> > Though for the defconfig case...somehow the cost is more than with the > sanitizers... > > arch/x86/mm/amdtopology.c:157:7: remark: '__nodes_weight' not inlined > into 'amd_numa_init' because too costly to inline (cost=930, > threshold=45) [-Rpass-missed=inline] > if (!nodes_weight(numa_nodes_parsed)) > ^ > > Looking at the output of `make LLVM=1 -j72 > arch/x86/mm/amdtopology.ll`, @__nodes_weight is just some inline asm > (.altinstructions). I wonder if I need to teach the cost model about > `asm inline`... Remind me, does clang understand 'asm inline("foo")'? Regardless, it seems that the asm (ALTERNATIVE("call __sw_hweight32", ... asm (ALTERNATIVE("call __sw_hweight64", ... in arch/x86/include/asm/arch_hweight.h could/should be made asm_inline at least for gcc's sake. Somewhat related: I really think we should remove __cold from the definition of __init: It hurts boot time (on a simple board with quite reproducible boot timing I measured 1-3% some time ago), and it is likely at least partially responsible for the never-ending tsunami of functions-that-obviously-should-have-been-inlined(TM) but were not because the caller is being optimized for size. Whatever small cost in extra .text is reclaimed after init - and those who are concerned about the size of the kernel image itself probably build with CONFIG_OPTIMIZE_FOR_SIZE=y, and I see no change in such an image whether __init includes __cold or not. Rasmus