Re: [PATCH] kallsyms: let kallsyms_on_each_match_symbol match symbols exactly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 6/23/23 10:43 AM, Nick Desaulniers wrote:
On Thu, Jun 22, 2023 at 9:11 AM Yonghong Song <yhs@xxxxxxxx> wrote:



On 6/22/23 6:35 AM, Petr Mladek wrote:
Hi,

I have added people mentioned in commits which modified
cleanup_symbol_name() in kallsyms.c.

I think that stripping ".*" suffix does not work for static
variables defined locally from symbol does always work,
see below.



On Wed 2023-06-21 15:34:27, Yonghong Song wrote:
On 6/21/23 12:18 PM, Song Liu wrote:
On Jun 21, 2023, at 1:52 AM, Petr Mladek <pmladek@xxxxxxxx> wrote:
On Tue 2023-06-20 22:36:14, Song Liu wrote:
On Jun 19, 2023, at 4:32 AM, Petr Mladek <pmladek@xxxxxxxx> wrote:
On Sun 2023-06-18 22:05:19, Song Liu wrote:
On Sun, Jun 18, 2023 at 8:32 PM Leizhen (ThunderTown)
<thunder.leizhen@xxxxxxxxxx> wrote:
I am not quite following the direction here. Do we need more
work for this patch?

Good question. I primary tried to add more details so that
we better understand the problem.

Honestly, I do not know the answer. I am neither familiar with
kpatch nor with clang.

This is pretty complicated.

1. clang with LTO does not use the suffix to eliminated duplicated
kallsyms, so old_sympos is still needed. Here is an example:

# grep init_once /proc/kallsyms
ffffffff8120ba80 t init_once.llvm.14172910296636650566
ffffffff8120ba90 t inode_init_once
ffffffff813ea5d0 t bpf_user_rnd_init_once
ffffffff813fd5b0 t init_once.llvm.17912494158778303782
ffffffff813ffbf0 t init_once
ffffffff813ffc60 t init_once
ffffffff813ffc70 t init_once
ffffffff813ffcd0 t init_once
ffffffff813ffce0 t init_once

There are two "init_once" with suffix, but there are also ones
without them.

This is correct. At LTO mode, when a static function/variable
is promoted to the global. The '.llvm.<hash>' is added to the
static function/variable name to form a global function name.
The '<hash>' here is computed based on IR of the compiled file
before LTO. So if one file is not modified, then <hash>
won't change after additional code change in other files.

OK, so the ".llvm.<hash>" suffix is added when a symbol is promoted
from static to global. Right?

Right at lest for llvm >= 15.
There are an alternative format '.llvm.<file_path>' suffix with
a more involved compilation process.

https://github.com/llvm/llvm-project/blob/main/llvm/test/ThinLTO/X86/promote-local-name.ll

But for kernel lto build, yes, only '.llvm.<hash>'.


2. kpatch-build does "Building original source", "Building patched
source", and then do binary diff of the two. From our experiments,
the suffix doesn't change between the two builds. However, we need
to match the build environment (path of kernel source, etc.) to
make sure suffix from kpatch matches the kernel.

The goal here is to generate the same IR (hence <hash>) if
file content is not changed. This way, <hash> value will
change only for those changed files.

Hmm, how does kpatch match the fixed functions? They are modified
so that they should get different IR (hash). Or do I miss
anything, please?

If the static function are promoted to global function and the file
containing static function changed, then that modified static
function will appear to be a *new* function. Live change should
be able to do it, right?


3. The goal of this patch is NOT to resolve different suffix by
llvm (.llvm.[0-9]+). Instead, we are trying fix issues like:

#  grep bpf_verifier_vlog /proc/kallsyms
ffffffff81549f60 t bpf_verifier_vlog
ffffffff8268b430 d bpf_verifier_vlog._entry
ffffffff8282a958 d bpf_verifier_vlog._entry_ptr
ffffffff82e12a1f d bpf_verifier_vlog.__already_done

And <function>.<symbol> notation seems to be used for static symbols
defined inside a function.

That is correct.


I guess that it is used when the symbols stay statics. It would
probably get additional ".llvm.<hash>" when it got promoted
from static to global. But this probably never happens.

I have not see a case like this yet.


Do I get it correctly?

yes, that is correct.


It means that we have two different types of name changes:

    1. .llvm.<hash> suffix

       If we remove this suffix then we will not longer distinguish
       symbols which stayed static and which were promoted to global
       ones.

       The result should be basically the same as without LTO.
       Some symbols might have duplicated name. But most symbols
       would have an unique one.


    2. <function>.<symbol> name

       In this case, <symbol> is _not_ suffix. It is actually
       the original symbol name.

       The extra thing is the <function>. prefix.

       These static variables seem to have special handling even
       with gcc without LTO. gcc adds an extra id instead,
       for example:

       $> nm vmlinux | grep " _entry_ptr" | head
       ffffffff82a2e800 d _entry_ptr.100135
       ffffffff82a2e7f8 d _entry_ptr.100178
       ffffffff82a32e70 d _entry_ptr.100798
       ffffffff82a1e240 d _entry_ptr.10342
       ffffffff82a33930 d _entry_ptr.104764
       ffffffff82a339c8 d _entry_ptr.104830
       ffffffff82a33928 d _entry_ptr.104871
       ffffffff82a33920 d _entry_ptr.104877
       ffffffff82a33918 d _entry_ptr.104893
       ffffffff82a339c0 d _entry_ptr.104905

       $> nm vmlinux | grep panic_console_dropped
       ffffffff853618e0 b panic_console_dropped.54158

IIRC, yes, these 'id' might change if source code changed.



Effect from the tracers POV?

    1. .llvm.<hash> suffix

       The names without the .llvm.<hash> suffix are the same as without
       LTO. This is probably why commit 8b8e6b5d3b013b0b ("kallsyms: strip
       ThinLTO hashes from static functions") worked. The tracers probably
       wanted to access only the symbols with uniqueue names


    2. <function>.<symbol> name

       The name without the .<symbol> suffix is the same as the function
       name. The result are duplicated function names.

       I do not understand why this was not a problem for tracers.
       Note that this is pretty common. _entry and _entry_ptr are
       added into any function calling printk().

       It seems to be working only by chance. Maybe, the tracers always
       take the first matched symbol. And the function name, without
       any suffix, is always the first one in the sorted list.

Note this only happens in LTO mode. Maybe lto kernel is not used
wide enough to discover this issue?

Likely.




Effect from livepatching POV:

    1. .llvm.<hash> suffix

        Comparing the full symbol name looks fragile to me because
        the <hash> might change.

        IMHO, it would be better to compare the names without
        the .llvm.<hash> suffix even for livepatches.


     2. <function>.<symbol> name

        The removal of <.symbol> suffix is a bad idea. The livepatch
        code is not able to distinguish the symbol of the <function>
        and static variables defined in this function.

        IMHO, it would be better to compare the full
         <function>.<symbol> name.


Result:

IMHO, cleanup_symbol_name() should remove only .llwn.* suffix.
And it should be used for both tracers and livepatching.

Does this makes any sense?

Song, does this fix the problem?

I only checked llvm15 and llvm17, not sure what kind of
suffix'es used for early llvm (>= llvm11).
Nick, could you help answer this question? What kind
of suffix are used for lto when promoting a local symbol
to a global one, considering all versions of llvm >= 11
since llvm 11 is the minimum supported version for kernel build.

For ToT for this case, the call chain backtrace looks like:

ModuleSummaryIndex::getGlobalNameForLocal
FunctionImportGlobalProcessing::getPromotedName
FunctionImportGlobalProcessing::processGlobalForThinLTO

This has been the case since release/11.x.

LLVM uses different mangling schemes for different places. For
example, function specialization (that occurs when sinking a constant
into a function) may rename a function from foo to something like
foo.42 where a dot followed by a monotonically increasing counter is
used. Numbers before may be missing from other symbols (where's .41?)
if they are DCE'd (perhaps because they were inlined and have no more
callers).  That is done by:

ValueSymbolTable::makeUniqueName which is eventually called by
FunctionSpecializer::createSpecialization.

That is the cause of common warnings from modpost.

There are likely numerous other special manglings done through llvm.
The above two are slightly more generic, but other passes may do
something more ad-hoc.

Thanks for the information. I checked FunctionSpecializer::createSpecialization and this also happens
without LTO. But here, we are discussing at LTO mode.
See https://github.com/torvalds/linux/blob/master/kernel/kallsyms.c#L166-L188

Let us say, without LTO, through function specializer,
two functions are generated:
   foo
   foo.58
and live patching for 'foo' can be done correctly
since live patching is able to match 'foo' precisely.

Now, the kernel is built with LTO, and two functions
are still generated
   foo
   foo.58
With current code, live patching searching a 'foo'
function and both function 'foo' and 'foo.58' will
be returned. If live patching intends to search
'foo.58' and actually there will be a mismatch.

There are also another case we discovered above,
   foo
   foo._entry (static variable inside foo)
   foo._entry_ptr (static variable inside foo)

These three symbols will be generated without LTO
and it works fine for tracing and live patching.

But at LTO mode, searching 'foo' will get
three symbols, and at least live patching
won't work any more.

Do you really have a use case that
foo.* (excluding foo.llvm.*) cause a problem?
Could you share it?




Best Regards,
Petr






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux Kernel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux