Re: [PATCH] modpost: add allow list for llvm IPSCCP

Masahiro Yamada <masahiroy@xxxxxxxxxx> · Thu, 30 Sep 2021 21:45:53 +0900

On Thu, Sep 30, 2021 at 9:19 AM Nick Desaulniers
<ndesaulniers@xxxxxxxxxx> wrote:
>
> On Wed, Sep 29, 2021 at 4:28 PM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Wed, Sep 29, 2021 at 3:59 PM Nick Desaulniers
> > <ndesaulniers@xxxxxxxxxx> wrote:
> > >
> > > +static const struct secref_exception secref_allowlist[] = {
> > > +       { .fromsym = "__first_node", .tosym = "numa_nodes_parsed" },
> > > +       { .fromsym = "__next_node", .tosym = "numa_nodes_parsed" },
> > > +       { .fromsym = "__nodes_weight", .tosym = "numa_nodes_parsed" },
> > > +       { .fromsym = "early_get_smp_config", .tosym = "x86_init" },
> > > +       { .fromsym = "test_bit", .tosym = "numa_nodes_parsed" },
> > > +};
>
> Thanks for your feedback.  This has been a long-standing issue with no
> clear path forward; I was looking forward to your input.
>
> >
> > This list is basically made-up and random.
>
> Definitely brittle.  And it contains checks that are specific to
> basically one set of configs for one arch. It sucks to pay that cost
> for unaffected architectures.
>
> > Why did those functions not get inlined?
>
> $ make LLVM=1 -j72 allmodconfig
> $ make LLVM=1 -j72 arch/x86/mm/amdtopology.o KCFLAGS=-Rpass-missed=inline.
> ...
> arch/x86/mm/amdtopology.c:110:7: remark: 'test_bit' not inlined into
> 'amd_numa_init' because too costly to inline (cost=115, threshold=45)
> [-Rpass-missed=inline]
>                 if (node_isset(nodeid, numa_nodes_parsed)) {
>                     ^
> arch/x86/mm/amdtopology.c:157:7: remark: '__nodes_weight' not inlined
> into 'amd_numa_init' because too costly to inline (cost=60,
> threshold=45) [-Rpass-missed=inline]
>         if (!nodes_weight(numa_nodes_parsed))
>              ^
> arch/x86/mm/amdtopology.c:171:2: remark: 'early_get_smp_config' not
> inlined into 'amd_numa_init' because too costly to inline (cost=85,
> threshold=45) [-Rpass-missed=inline]
>         early_get_smp_config();
>         ^
> arch/x86/mm/amdtopology.c:178:2: remark: '__first_node' not inlined
> into 'amd_numa_init' because too costly to inline (cost=70,
> threshold=45) [-Rpass-missed=inline]
>         for_each_node_mask(i, numa_nodes_parsed)
>         ^
> arch/x86/mm/amdtopology.c:178:2: remark: '__next_node' not inlined
> into 'amd_numa_init' because too costly to inline (cost=95,
> threshold=45) [-Rpass-missed=inline]
>
>
> ie. for allmodconfig, the sanitizers add too much instrumentation to
> the callees that they become too large to be considered profitable to
> inline by the cost model.  Note that LLVM's inliner works bottom up,
> not top down.
>
> Though for the defconfig case...somehow the cost is more than with the
> sanitizers...
>
> arch/x86/mm/amdtopology.c:157:7: remark: '__nodes_weight' not inlined
> into 'amd_numa_init' because too costly to inline (cost=930,
> threshold=45) [-Rpass-missed=inline]
>         if (!nodes_weight(numa_nodes_parsed))
>              ^
>
> Looking at the output of `make LLVM=1 -j72
> arch/x86/mm/amdtopology.ll`, @__nodes_weight is just some inline asm
> (.altinstructions). I wonder if I need to teach the cost model about
> `asm inline`...
>
> For the allmodconfig build it looks like `__nodes_weight` calls
> `__bitmap_weight` and the code coverage runtime hooks.
>
> > Wouldn't it be better to make
> > them always-inline?
>
> Perhaps, see what that might look like:
> https://github.com/ClangBuiltLinux/linux/issues/1302#issuecomment-807260475
> Does that look better?
>
> > Or, like in at least the early_get_smp_config() case, just make it be
> > marked __init, so that if it doesn't get inlined it gets the right
> > section?
>
> In the case of early_get_smp_config(), that's what Boris suggested:
> https://lore.kernel.org/lkml/20210225114533.GA380@xxxxxxx/

__init works particularly for early_get_smp_config().

For static line helpers that are called from __init and non-__init functions,
maybe __ref will work.

In my understanding, the .ref.text section is not free'd,
but modpost bypasses the section mismatch checks.

I am not sure what is a better approach for generic cases,
__always_inline, __ref, or what else?

I am not a big fan of this patch, at least...
(The reason was already stated by Linus)

-- 
Best Regards
Masahiro Yamada