Re: [RFC dwarves 0/6] Encoding function addresses using DECL_TAGs

Alan Maguire <alan.maguire@xxxxxxxxxx> · Fri, 19 May 2023 16:08:44 +0100

On 19/05/2023 10:44, Yafang Shao wrote:
> On Thu, May 18, 2023 at 12:18 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
>>
>> As a means to continue the discussion in [1], which is
>> concerned with finding the best long-term solution to
>> having a BPF Type Format (BTF) representation of
>> functions that is usable for tracing of edge cases, this
>> proof-of-concept series is intended to explore one approach
>> to adding information to help make tracing more accurate.
>>
>> A key problem today is that there is no matching from function
>> description to the actual instances of a function.
>>
>> When that function only has one description, that is
>> not an issue, but if we have multiple inconsistent
>> static functions in different CUs such as
>>
>> From kernel/irq/irqdesc.c
>>
>>     static ssize_t wakeup_show(struct kobject *kobj,
>>                                struct kobj_attribute *attr, char *buf)
>>
>> ...and from drivers/base/power/sysfs.c
>>
>>     static ssize_t wakeup_show(struct device *dev, struct device_attribute *attr,
>>                                char *buf);
>>
>> ...this becomes a problem.  If I am attaching,
>> which do I want?  And even if I know which one
>> I want, which instance in kallsyms is which?
>>
> 
> As you described in the above example,  it is natural to attach a
> *function* defined in a specific *file_path*.  So why not encoding the
> file path instead ? What's the problem in it?
> 
> If we expose the addr and let the user select which address to attach,
> it will be a trouble to deploy a bpf program across multiple kernels.
> While the file path will have a lower chance to be conflict between
> different kernel versions. So I think it would be better if we use the
> file path instead and let the kernel find the address automatically.
> In the old days, when we wanted to deploy a kprobe kernel module or a
> systemtap script across multiple kernels, we had to use if-else in the
> code, which was really troublesome as it is not scalable. I don't
> think we want to do it the same way in the bpf program.
> 

I think it's important to distinguish between what we do
in libbpf/kernel to ensure safe attach and what sorts
of tracing capabilities a tracer supports. For example,
a tracer (or indeed libbpf) can take the set of different
instances of cpumask_weight() and because they all have the
same BTF description, can attach to all instances. What we
lose though by not having address-level accuracy is the
ability to do more precise tracing.

I also don't think matching addresses -> BTF ids stops us
having file information in the mix; a "struct btf_func"
could potentially contain a name offset to a string
specifying the file too. My concern though is that
relying on _just_ the file/line will not be enough
in some cases. There are some weird edge cases in the
kernel. I'll give an example where file/line isn't
enough to figure out the mapping.

Consider 'struct elf_note_info'; there are two copies
of this in the vmlinux BTF:

[16819] STRUCT 'elf_note_info' size=248 vlen=8
        'thread' type_id=16817 bits_offset=0
        'psinfo' type_id=16815 bits_offset=64
        'signote' type_id=16815 bits_offset=256
        'auxv' type_id=16815 bits_offset=448
        'files' type_id=16815 bits_offset=640
        'csigdata' type_id=6083 bits_offset=832
        'size' type_id=74 bits_offset=1856
        'thread_notes' type_id=21 bits_offset=1920

[16851] STRUCT 'elf_note_info' size=248 vlen=8
        'thread' type_id=16850 bits_offset=0
        'psinfo' type_id=16815 bits_offset=64
        'signote' type_id=16815 bits_offset=256
        'auxv' type_id=16815 bits_offset=448
        'files' type_id=16815 bits_offset=640
        'csigdata' type_id=6507 bits_offset=832
        'size' type_id=74 bits_offset=1856
        'thread_notes' type_id=21 bits_offset=1920

...and here's the structure itself:

struct elf_note_info {
 	struct elf_thread_core_info *thread;
 	struct memelfnote psinfo;
 	struct memelfnote signote;
 	struct memelfnote auxv;
 	struct memelfnote files;
 	user_siginfo_t csigdata;
 	size_t size;
 	int thread_notes;
};

The reason there are two copies is not a dedup failure;
in the first case user_siginfo_t is defined as

[6083] TYPEDEF 'siginfo_t' type_id=6082

while in the second case it gets defined as

[6507] TYPEDEF 'compat_siginfo_t' type_id=6506

The reason is include/linux/compat.h was included
in one place when fs/binfmt_elf.c was built and not
in another when fs/binfmt_elf.c was built the
second time.

Because the object was built into the kernel twice,
as well as having two different instances of BTF descriptions
for the same structure, we also have multiple different BTF
function descriptions for functions that use a
"struct elf_note_info *", and we need to figure out which
maps to which kallsyms instance of the functions like
this:

ffffffff81489380 t fill_thread_core_info
ffffffff8148ca90 t fill_thread_core_info

Now, which BTF description goes with which instance?
The file/line number is identical for each BTF description,
but the BTF is (correctly) different because the object has
been built into the kernel twice and is slightly different
in each case. The difference isn't significant in practice here,
but that doesn't mean it couldn't be in other cases. Imagine a
case where one instance of the structure contained a field and
the other didn't; we'd be interpreting the tracing data
incorrectly.

I realize it's a weird edge case, but that's just one
I found from digging a bit. My worry is if we go with
the file/line as a way of identifying the function,
these sorts of edge cases will pile up and we'll need
to go more fine-grained and use addresses in the end
anyway. Again, that doesn't stop us applying higher-level
semantics at a tracer level, but we need to build those on
solid foundations.

Alan

>> This series is a proof-of-concept that supports encoding
>> function addresses and associating them with BTF FUNC
>> descriptions using BTF declaration tags.
>>
>> More work would need to be done on the kernel side
>> to _use_ this representation, but hopefully having a
>> rough approach outlined will help make that more feasible.
>>
>> [1] https://lore.kernel.org/bpf/ZF61j8WJls25BYTl@krava/
>>
>> Alan Maguire (6):
>>   btf_encoder: record function address and if it is local
>>   dwarf_loader: store address in function low_pc if available
>>   dwarf_loader: transfer low_pc info from subtroutine to its abstract
>>     origin
>>   btf_encoder: add "addr=0x<addr>" function declaration tag if
>>     --btf_gen_func_addr specified
>>   btf_encoder: store ELF function representations sorted by name _and_
>>     address
>>   pahole: document --btf_gen_func_addr
>>
>>  btf_encoder.c      | 64 +++++++++++++++++++++++++++++++++++-----------
>>  btf_encoder.h      |  4 +--
>>  dwarf_loader.c     | 16 +++++++++---
>>  dwarves.h          |  3 +++
>>  man-pages/pahole.1 |  8 ++++++
>>  pahole.c           | 12 +++++++--
>>  6 files changed, 85 insertions(+), 22 deletions(-)
>>
>> --
>> 2.31.1
>>
> 
>