BTF, pahole and static functions (was Re: [PATCH] libbpf: resolve kernel function name optimization for kprobe)

Alan Maguire <alan.maguire@xxxxxxxxxx> · Fri, 20 Jan 2023 13:20:50 +0000

On 18/01/2023 18:15, Yonghong Song wrote:
> 
> 
> On 1/13/23 12:00 AM, Yonghong Song wrote:
>>
>>
>> On 1/12/23 1:07 PM, Alexei Starovoitov wrote:
>>> On Thu, Jan 12, 2023 at 2:20 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
>>>>
>>>> On 12/01/2023 07:23, Yonghong Song wrote:
>>>>>
>>>>>
>>>>> On 1/9/23 7:11 PM, Menglong Dong wrote:
>>>>>> On Tue, Jan 10, 2023 at 4:29 AM Yonghong Song <yhs@xxxxxxxx> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 1/9/23 1:42 AM, menglong8.dong@xxxxxxxxx wrote:
>>>>>>>> From: Menglong Dong <imagedong@xxxxxxxxxxx>
>>>>>>>>
>>>>>>>> The function name in kernel may be changed by the compiler. For example,
>>>>>>>> the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'.
>>>>>>>>
>>>>>>>> This kind optimization can happen in any kernel function. Therefor, we
>>>>>>>> should conside this case.
>>>>>>>>
>>>>>>>> If we failed to attach kprobe with a '-ENOENT', then we can lookup the
>>>>>>>> kallsyms and check if there is a similar function end with '.xxx', and
>>>>>>>> retry.
>>>>>>>
>>>>>>> This might produce incorrect result, so this approach won't work
>>>>>>> for all .isra.0 cases. When a function name is changed from
>>>>>>> <func> to <func>.isra.<num>, it is possible that compiler may have
>>>>>>> make some changes to the arguments, e.g., removing one argument,
>>>>>>> chaning a semantics of argument, etc. if bpf program still
>>>>>>> uses the original function signature, the bpf program may
>>>>>>> produce unexpected result.
>>>>>>
>>>>>> Oops, I wasn't aware of this part. Can we make this function disabled
>>>>>> by default and offer an option to users to enable it? Such as:
>>>>>>
>>>>>>       bpf_object_adapt_sym(struct bpf_object *obj)
>>>>>>
>>>>>> In my case, kernel function rename is common, and I have to
>>>>>> check all functions and do such adaptation before attaching
>>>>>> my kprobe programs, which makes me can't use auto-attach.
>>>>>>
>>>>>> What's more, I haven't seen the arguments change so far, and
>>>>>> maybe it's not a common case?
>>>>>
>>>>> I don't have statistics, but it happens. In general, if you
>>>>> want to attach to a function like <foo>, but it has a variant
>>>>> <foo>.isra.<num>, you probably should check assembly code
>>>>> to ensure the parameter semantics not changed, and then
>>>>> you can attach to kprobe function <foo>.isra.<num>, which
>>>>> I assume current libbpf infrastructure should support it.
>>>>> After you investigate all these <foo>.isra.<num> functions
>>>>> and confirm their argument semantics won't change, you
>>>>> could use kprobe multi to do attachment.
>>>>>
>>>>
>>>> I crunched some numbers on this, and discovered out of ~1600
>>>> .isra/.constprop functions, 76 had a missing argument. The patch series
>>>> at [1] is a rough attempt to get pahole to spot these, and add
>>>> BTF entries for each, where the BTF representation reflects
>>>> reality by skipping optimized-out arguments. So for a function
>>>> like
>>>>
>>>> static int ip6_nh_lookup_table(struct net *net, struct fib6_config *cfg,
>>>>                                 const struct in6_addr *gw_addr, u32 tbid,
>>>>                                 int flags, struct fib6_result *res);
>>>>
>>>> Examining the BTF representation using pahole from [1], we see
>>>>
>>>> int ip6_nh_lookup_table.isra.0(struct net *net, struct fib6_config *cfg, struct in6_addr *gw_addr, u32 tbid, int flags);
>>>>
>>>> Comparing to the definition, we see the last parameter is missing,
>>>> i.e. the "struct fib6_result *" argument is missing. The calling pattern -
>>>> where the callers have a struct fib6_result on the stack and pass a pointer -
>>>> is reflected in late DWARF info which shows the argument is not actually
>>>> passed as a register, but can be expressed as an offset relative to the current
>>>> function stack (DW_OP_fbreg).
>>>>
>>>> This approach howvever introduces the problem that currently the kernel
>>>> doesn't  allow a "." in a function name. We can fix that, but any BTF encoding
>>>> that introduced optimized functions containing a  "." would have to be opt-in
>>>> via a pahole option, so we do not generate invalid vmlinux BTF for kernels
>>>> without that change.
>>>>
>>>> An alternative approach would be to simply encode .isra functions
>>>> in BTF without the .isra suffix (i.e. using "function_name" not
>>>> "function_name.isra"), only doing the BTF encoding if no arguments were
>>>> optimized out - i.e. if the function signature matches expectations.
>>>> The 76 functions with optimized-out parameters could simply be skipped.
>>>> To me that feels like the simpler approach - it avoids issues
>>>> with function name BTF encoding, and with that sort of model a
>>>> loose-matching kallsyms approach - like that described here - could be used
>>>> for kprobes and fentry/fexit. It also fits with the DWARF representation -
>>>> the .isra suffixes are not present in DWARF representations of the function,
>>>> only in the symbol table and kallsyms, so perhaps BTF should follow suit
>>>> and not add the suffixes. What do you think?
>>>
>>> Sounds like a great idea to me.
>>> Addresses this issue in a clean way.
>>
>> Yes, the second approach seems a reasonable approach. If the number
>> of parameters for the *actual* functions equals to the number
>> of parameters for the defined function (abstract_origin),
>> we can roughly assume the actual function signature matches
>> the prototype. Although it is theoretically possible that
>> compiler might change parameter types, e.g., from a
>> struct pointer (struct foo *p) to a int value (p->field1).
>> But this should be extremely rare and we need compiler emitting
>> additional dwarf data (might through btf_decl_tag) to discover
>> such cases.
>

Thanks! I've prototyped a solution at [1].

The key problem is pahole processing compilation units separately;
the issue is that for some functions, they have optimized out
parameters in some compilation units but not others (NF_HOOK()
does this for example).  It's a pain, especially as we want to 
preserve parallel BTF encoding as much as possible, so the
best solution I could come up with was to save information on
functions that had a suffix match in a global encoder binary
tree.  Then, when we are collecting threads, they can be safely
added prior to BTF merging, since at that point we have
recorded if they have optimized-out parameters in any compilation
unit.  There may be a better way to handle this, but I think
we are stuck with comparing binary-wide to see if the
parameters are consistent.  The code is (I think) careful
to limit this to cases where "."-suffixed functions are found.

In testing this however, I think there is a wider issue with
BTF encoding of static functions which may require a similar global
comparison mechanism.  More below...

> I checked with some compiler guys at Meta. They confirmed that clang
> might have the same optimization (eliminating some function parameters
> for static functions), but clang won't change function linkage name.
> So that means, clang won't do static function cloning and it will
> only remove function parameters if this can be applied to all call sites.
> 

Great, that simplifies things a lot.

> I checked the clang (clang16) build kernel with latest bpf-next
> and didn't find a single instance that a static function's parameter
> is removed.
> 

Excellent!

> Also, current libbpf kprobe API supports to pass a function address
> to kernel. So user space can always do their own /proc/kallsyms search
> and find func address for either regular function or function
> with .isa.<num>/.constprop.<num> suffixes.

Right; one way I've done this is to have a special "okprobe" section
name where we support matching with a "." suffix as well as with
the exact name. Might be worth adding that support to libbpf itself
if the above approach lands.

The bigger concern I have is in testing this I hit a problem which
at first looked like a parallel BTF encoding problem, but on
deeper analysis looks like a conceptual issue in how we handle
static functions.

To demonstrate, generate vmlinux BTF twice, once with a single thread
and once with 2 threads.  We observe a different number of functions
that end up in BTF for the exact same object:

$ LLVM_OBJCOPY=objcopy pahole -J -j1 --btf_gen_floats --lang_exclude=rust .tmp_vmlinux.btf
$ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|wc -l
57596

$ LLVM_OBJCOPY=objcopy pahole -J -j2 --btf_gen_floats --lang_exclude=rust .tmp_vmlinux.btf
$ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|wc -l
57714

So we see 118 more functions in the latter case. Why would this differ? If we sort
and strip out duplicates, we see that the cause is multiple copies of functions:

$ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|awk '{print $3}'|sort|uniq|wc -l
57596

Once clue is that each encoder maintains a function table of ELF symbols, and 
marks a symbol as generated if it has been added to BTF.  As the encoder
traverses CUs, it adds to the encoder symbol table.  In a single-threaded encoding,
we will see only one instance of a function in the final BTF because the first
match of the binary seach of the function list will be returned.  With multiple
BTF encoders however - each with its own internal representation of the symbol
table - there will be multiple instances of functions added to each individual
encoder's BTF fragment.  This is supposed to be okay because BTF deduplication
will handle merging these.

However, this does not take into account the fact that the same function name
may live in different CUs as a static function with a different function signature.
Could this explain the extra functions?

It seems to account for many of them. To take an example tcp_in_window()
has two representations:

122435] FUNC_PROTO '(anon)' ret_type_id=122409 vlen=7
        'ct' type_id=65074
        'dir' type_id=2170
        'index' type_id=6
        'skb' type_id=2012
        'dataoff' type_id=6
        'tcph' type_id=61910
        'hook_state' type_id=29004
[122436] FUNC 'tcp_in_window' type_id=122435 linkage=static

[66683] FUNC_PROTO '(anon)' ret_type_id=372 vlen=4
        'seq' type_id=23
        'end_seq' type_id=23
        's_win' type_id=23
        'e_win' type_id=23
[66684] FUNC 'tcp_in_window' type_id=66683 linkage=static

...one from nf_conntrack_proto_tcp.c, the other from tcp_minisocks.c.

This raises the conceptual problem - what do we do with such cases?
>From a code perspective, it's totally fine to have conflicting static
functions in different CUs, and one approach would be to retain multiple 
conflicting function signatures; however this is not really useful as
there is no mapping from BTF function signature to site.  As a result
we have no way of knowing which signature applies to which function site.
So perhaps the best approach is to eliminate such inconsistent static
function descriptions? The actual amount is small, ~100 functions.

Assuming that makes sense, the next question is how.  One approach
would be at BTF deduplication time, but we have seen cases in the
past where BTF did not fully deduplicate identical structures, and
in such cases, multiple copies of a function are present which
have multiple (but identical) parameter type descriptions. If this
happens, the danger in eliminating both is we might eliminate
critical kernel functions due to a type deduplication issue. One
way to avoid this would be a comparison of number of parameters
and (failing that) parameter names; such a comparison would not
be subject to issues with identical types leading to identical 
function definitions. Not 100% foolproof, but would work in
nearly all cases I suspect.

Another approach would be to re-use the mechanics introduced to
compare static functions to see if they have optimized-out
parameters to compare function signatures.  So static functions
paradoxically have to be stored in a global tree and compared
to weed out inconsistent duplicates.  Luckily, these comparisons
can be quite superficial; for the most part number of arguments
suffices.

What do you think is the best way forward to solve this problem?
The optimization-handling code gives us a way to deal with this
by postponing addition of such functions until we can guarantee
consistency, and I've roughly prototyped a patch on top of [1]
that works, but I think first we need to figure out what to do
with such inconsistent static functions before determining how we 
do it.  What do folks think?

Alan

[1] https://github.com/acmel/dwarves/compare/master...alan-maguire:dwarves:btf-optimized