Re: [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/11/2024 17:07, Yonghong Song wrote:
> 
> 
> 
> On 11/12/24 8:56 AM, Alan Maguire wrote:
>> On 12/11/2024 01:51, Yonghong Song wrote:
>>>
>>>
>>> On 11/11/24 7:39 AM, Alan Maguire wrote:
>>>> On 08/11/2024 18:05, Yonghong Song wrote:
>>>>> Song Liu reported that a kernel func (perf_event_read()) cannot be
>>>>> traced
>>>>> in certain situations since the func is not in vmlinux bTF. This
>>>>> happens
>>>>> in kernels 6.4, 6.9 and 6.11 and the kernel is built with pahole 1.27.
>>>>>
>>>>> The perf_event_read() signature in kernel (kernel/events/core.c):
>>>>>      static int perf_event_read(struct perf_event *event, bool group)
>>>>>
>>>>> Adding '-V' to pahole command line, and the following error msg can
>>>>> be found:
>>>>>      skipping addition of 'perf_event_read'(perf_event_read) due to
>>>>> unexpected register used for parameter
>>>>>
>>>>> Eventually the error message is attributed to the setting
>>>>> (parm->unexpected_reg = 1) in parameter__new() function.
>>>>>
>>>>> The following is the dwarf representation for perf_event_read():
>>>>>       0x0334c034:   DW_TAG_subprogram
>>>>>                   DW_AT_low_pc    (0xffffffff812c6110)
>>>>>                   DW_AT_high_pc   (0xffffffff812c640a)
>>>>>                   DW_AT_frame_base        (DW_OP_reg7 RSP)
>>>>>                   DW_AT_GNU_all_call_sites        (true)
>>>>>                   DW_AT_name      ("perf_event_read")
>>>>>                   DW_AT_decl_file ("/rw/compile/kernel/events/core.c")
>>>>>                   DW_AT_decl_line (4641)
>>>>>                   DW_AT_prototyped        (true)
>>>>>                   DW_AT_type      (0x03324f6a "int")
>>>>>       0x0334c04e:     DW_TAG_formal_parameter
>>>>>                     DW_AT_location        (0x007de9fd:
>>>>>                        [0xffffffff812c6115, 0xffffffff812c6141):
>>>>> DW_OP_reg5 RDI
>>>>>                        [0xffffffff812c6141, 0xffffffff812c6323):
>>>>> DW_OP_reg14 R14
>>>>>                        [0xffffffff812c6323, 0xffffffff812c63fe):
>>>>> DW_OP_GNU_entry_value(DW_OP_reg5 RDI), DW_OP_stack_value
>>>>>                        [0xffffffff812c63fe, 0xffffffff812c6405):
>>>>> DW_OP_reg14 R14
>>>>>                        [0xffffffff812c6405, 0xffffffff812c640a):
>>>>> DW_OP_GNU_entry_value(DW_OP_reg5 RDI), DW_OP_stack_value)
>>>>>                     DW_AT_name    ("event")
>>>>>                     DW_AT_decl_file       ("/rw/compile/kernel/events/
>>>>> core.c")
>>>>>                     DW_AT_decl_line       (4641)
>>>>>                     DW_AT_type    (0x0333aac2 "perf_event *")
>>>>>       0x0334c05e:     DW_TAG_formal_parameter
>>>>>                     DW_AT_location        (0x007dea82:
>>>>>                        [0xffffffff812c6137, 0xffffffff812c63f2):
>>>>> DW_OP_reg12 R12
>>>>>                        [0xffffffff812c63f2, 0xffffffff812c63fe):
>>>>> DW_OP_GNU_entry_value(DW_OP_reg4 RSI), DW_OP_stack_value
>>>>>                        [0xffffffff812c63fe, 0xffffffff812c640a):
>>>>> DW_OP_reg12 R12)
>>>>>                     DW_AT_name    ("group")
>>>>>                     DW_AT_decl_file       ("/rw/compile/kernel/events/
>>>>> core.c")
>>>>>                     DW_AT_decl_line       (4641)
>>>>>                     DW_AT_type    (0x03327059 "bool")
>>>>>
>>>>> By inspecting the binary, the second argument ("bool group") is used
>>>>> in the function. The following are the disasm code:
>>>>>       ffffffff812c6110 <perf_event_read>:
>>>>>       ffffffff812c6110: 0f 1f 44 00 00        nopl    (%rax,%rax)
>>>>>       ffffffff812c6115: 55                    pushq   %rbp
>>>>>       ffffffff812c6116: 41 57                 pushq   %r15
>>>>>       ffffffff812c6118: 41 56                 pushq   %r14
>>>>>       ffffffff812c611a: 41 55                 pushq   %r13
>>>>>       ffffffff812c611c: 41 54                 pushq   %r12
>>>>>       ffffffff812c611e: 53                    pushq   %rbx
>>>>>       ffffffff812c611f: 48 83 ec 18           subq    $24, %rsp
>>>>>       ffffffff812c6123: 41 89 f4              movl    %esi, %r12d
>>>>>       <=========== NOTE that here '%esi' is used and moved to '%r12d'.
>>>>>       ffffffff812c6126: 49 89 fe              movq    %rdi, %r14
>>>>>       ffffffff812c6129: 65 48 8b 04 25 28 00 00 00    movq    %gs:40,
>>>>> %rax
>>>>>       ffffffff812c6132: 48 89 44 24 10        movq    %rax, 16(%rsp)
>>>>>       ffffffff812c6137: 8b af a8 00 00 00     movl    168(%rdi), %ebp
>>>>>       ffffffff812c613d: 85 ed                 testl   %ebp, %ebp
>>>>>       ffffffff812c613f: 75 3f                 jne
>>>>> 0xffffffff812c6180 <perf_event_read+0x70>
>>>>>       ffffffff812c6141: 66 2e 0f 1f 84 00 00 00 00 00 nopw    %cs:
>>>>> (%rax,%rax)
>>>>>       ffffffff812c614b: 0f 1f 44 00 00        nopl    (%rax,%rax)
>>>>>       ffffffff812c6150: 49 8b 9e 28 02 00 00  movq    552(%r14), %rbx
>>>>>       ffffffff812c6157: 48 89 df              movq    %rbx, %rdi
>>>>>       ffffffff812c615a: e8 c1 a0 d7 00        callq
>>>>> 0xffffffff82040220 <_raw_spin_lock_irqsave>
>>>>>       ffffffff812c615f: 49 89 c7              movq    %rax, %r15
>>>>>       ffffffff812c6162: 41 8b ae a8 00 00 00  movl    168(%r14), %ebp
>>>>>       ffffffff812c6169: 85 ed                 testl   %ebp, %ebp
>>>>>       ffffffff812c616b: 0f 84 9a 00 00 00     je
>>>>> 0xffffffff812c620b <perf_event_read+0xfb>
>>>>>       ffffffff812c6171: 48 89 df              movq    %rbx, %rdi
>>>>>       ffffffff812c6174: 4c 89 fe              movq    %r15, %rsi
>>>>>       <=========== NOTE: %rsi is overwritten
>>>>>       ......
>>>>>       ffffffff812c63f0: 41 5c                 popq    %r12
>>>>>       <============ POP r12
>>>>>       ffffffff812c63f2: 41 5d                 popq    %r13
>>>>>       ffffffff812c63f4: 41 5e                 popq    %r14
>>>>>       ffffffff812c63f6: 41 5f                 popq    %r15
>>>>>       ffffffff812c63f8: 5d                    popq    %rbp
>>>>>       ffffffff812c63f9: e9 e2 a8 d7 00        jmp
>>>>> 0xffffffff82040ce0 <__x86_return_thunk>
>>>>>       ffffffff812c63fe: 31 c0                 xorl    %eax, %eax
>>>>>       ffffffff812c6400: e9 be fe ff ff        jmp
>>>>> 0xffffffff812c62c3 <perf_event_read+0x1b3>
>>>>>
>>>>> It is not clear why dwarf didn't encode %rsi in locations. But
>>>>> DW_OP_GNU_entry_value(DW_OP_reg4 RSI) tells us that RSI is live at
>>>>> the entry of perf_event_read(). So this patch tries to search
>>>>> DW_OP_GNU_entry_value/DW_OP_entry_value location/expression so if
>>>>> the expected parameter register matchs the register in
>>>>> DW_OP_GNU_entry_value/DW_OP_entry_value, then the original parameter
>>>>> is not optimized.
>>>>>
>>>>> For one of internal 6.11 kernel, there are 62498 functions in BTF and
>>>>> perf_event_read() is not there. With this patch, there are 61552
>>>>> functions
>>>>> in BTF and perf_event_read() is included.
>>>>>
>>>> hi Yonghong,
>>>>
>>>> I'm confused by these numbers. I would have thought your changes would
>>>> have led to a net increase of functions encoded in vmlinux BTF since we
>>>> are now likely catching more cases where registers are expected. 
>>>> When I
>>>> ran your patches against an LLVM-built kernel, that's what I saw; 70
>>>> additional functions were recognized as having expected parameters, and
>>>> thus were encoded in BTF. In your case it looks like we lost nearly
>>>> 1000
>>>> functions. Any idea what's going on there? If you can share your
>>>> config,
>>>> LLVM version I can dig into this from my side too. Thanks!
>>> Attached is my config (based on one of meta internal configs). I tried
>>> with master branch with head:
>>>
>>> 7b6e5bfa2541380b478ea1532880210ea3e39e11 (HEAD -> master, origin/master,
>>> origin/HEAD) Merge branch 'refactor-lock-management'
>>> ae6e3a273f590a2b64f14a9fab3546c3a8f44ed4 bpf: Drop special callback
>>> reference handling
>>> f6b9a69a9e56b2083aca8a925fc1a28eb698e3ed bpf: Refactor active lock
>>> management
>>>
>>> I am using pahole v1.27.
>>>
>>> I am using an llvm built from upstream. The following is llvm-project
>>> head:
>>> beb12f92c71981670e07e47275efc6b5647011c1 (HEAD -> main) [RISCV] Add
>>> +optimized-nfN-segment-load-store (#114414)
>>> 6bad4514c938b3b48c0c719b8dd98b3906f2c290 [AArch64] Extend vector mull
>>> test coverage. NFC
>>> 915b910d800d7fab6a692294ff1d7075d8cba824 [libc] Fix typos in proxy type
>>> headers (#114717)
>>> 98ea1a81a28a6dd36941456c8ab4ce46f665f57a [IPO] Remove unused includes
>>> (NFC) (#114716)
>>>
>>> With the above setup, when to do
>>>
>>> pahole -JV --
>>> btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs vmlinux >& log.pahole
>>>
>>> You will find the below info in the log:
>>>    skipping addition of 'perf_event_read'(perf_event_read) due to
>>> unexpected register used for paramet
>>>
>>> In the dwarf:
>>>
>>> 0x02122746:   DW_TAG_subprogram
>>>                  DW_AT_low_pc    (0xffffffff81299740)
>>>                  DW_AT_high_pc   (0xffffffff812999f7)
>>>                  DW_AT_frame_base        (DW_OP_reg7 RSP)
>>>                  DW_AT_GNU_all_call_sites        (true)
>>>                  DW_AT_name      ("perf_event_read")
>>>                  DW_AT_decl_file ("/home/yhs/work/bpf-next/kernel/
>>> events/
>>> core.c")
>>>                  DW_AT_decl_line (4746)
>>>                  DW_AT_prototyped        (true)
>>>                  DW_AT_type      (0x020f95f5 "int")
>>>
>>> 0x02122760:     DW_TAG_formal_parameter
>>>                    DW_AT_location        (0x00769b72:
>>>                       [0xffffffff81299745, 0xffffffff81299764):
>>> DW_OP_reg5 RDI
>>>                       [0xffffffff81299764, 0xffffffff81299937):
>>> DW_OP_reg3 RBX
>>>                       [0xffffffff81299937, 0xffffffff812999f0):
>>> DW_OP_GNU_entry_value(DW_OP_reg5 RDI), DW_OP_stack_value
>>>                       [0xffffffff812999f0, 0xffffffff812999f7):
>>> DW_OP_reg3 RBX)
>>>                    DW_AT_name    ("event")
>>>                    DW_AT_decl_file       ("/home/yhs/work/bpf-next/
>>> kernel/events/core.c")
>>>                    DW_AT_decl_line       (4746)
>>>                    DW_AT_type    (0x0210f654 "perf_event *")
>>>                      0x02122770:     DW_TAG_formal_parameter
>>>                    DW_AT_location        (0x00769c61:
>>>                       [0xffffffff81299758, 0xffffffff81299926):
>>> DW_OP_reg6 RBP
>>>                       [0xffffffff81299926, 0xffffffff812999f0):
>>> DW_OP_GNU_entry_value(DW_OP_reg4 RSI), DW_OP_stack_value
>>>                       [0xffffffff812999f0, 0xffffffff812999f7):
>>> DW_OP_reg6 RBP)
>>>                    DW_AT_name    ("group")
>>>                    DW_AT_decl_file       ("/home/yhs/work/bpf-next/
>>> kernel/events/core.c")
>>>
>>> The above is slightly different from our production kernel where Song
>>> reported. But essence is the same.
>>> The second parameter needs to check DW_OP_GNU_entry_value(DW_OP_reg4
>>> RSI) to ensure the second
>>> argument is available.
>>>
>>> My patch is supposed to only make improvement. I am curiously why you
>>> get less functions encoded in BTF.
>>>
>> Thanks for the config etc! When I build bpf-next using master branch
>> llvm and this config, I see
>>
>> with baseline (master branch pahole): 62371 functions, no perf_event_read
>> your series on top of master branch pahole: 62433 functions,
>> perf_event_read present
>>
>> So that's consistent with what I've seen with other configs; more
>> functions are present in vmlinux BTF since we are now seeing more cases
>> where parameters are in fact consistent.  The part that confuses me
>> though is the numbers you initially reported above
>>
>> "for one of internal 6.11 kernel, there are 62498 functions in BTF and
>> perf_event_read() is not there. With this patch, there are 61552
>> functions in BTF and perf_event_read() is included."
>>
>> These numbers suggest you lost nearly 1000 functions when building
>> vmlinux BTF with pahole using this series. That's the part I don't
>> understand - we should just see a gain in numbers of functions in
>> vmlinux BTF, right? Did you mean 62552 functions rather than 61552
>> perhaps?
> 
> Sorry, really embarrassing. it is typo. Indeed it should be 62552 functions
> in BTF instead.
>

No problem, makes perfect sense now, thanks! I'm trying to reproduce the
core dumps Eduard saw now with this setup; I'll report back if I manage
to do so and see if locks as Jiri and Arnaldo suggested help. If so a v2
along the lines of Eduard's suggested change plus locking might be the
best approach, what do you think? Thanks!

Alan




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux