[Crash-utility] Re: [PATCH v6 0/5] Improve stack unwind on ppc64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 02, 2024 at 07:04:14AM +0000, HAGIO KAZUHITO(萩尾 一仁) wrote:
> Hi Aditya,
> 
> Thank you for the work, it looks nicely done to me, except for the "info 
> threads" issue and the style of the gdb patch.
> 
> Hi Tao Liu,
> 
> I saw that you were interested in support for other architectures.  I'd 
> like to test/support this also on x86_64 at the same time as ppc64 if 
> possible.  Do you have any trial patch or plan?

Hi Kazu & Aditya,

Sorry for the delay. You can access my trial patch by https://github.com/liutgnu/crash-dev. There are some known issue:

1) Some vmcores will stack unwind fail:
$ ./crash /var/crash/127.0.0.1-2023-11-10-18\:27\:30/vmcore ~/vmlinux
      KERNEL: /root/vmlinux  [TAINTED]          
    DUMPFILE: /var/crash/127.0.0.1-2023-11-10-18:27:30/vmcore  [PARTIAL DUMP]
        CPUS: 1
        DATE: Fri Nov 10 18:27:26 CST 2023
      UPTIME: 00:10:49
LOAD AVERAGE: 0.07, 0.11, 0.08
       TASKS: 133
    NODENAME: ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com
     RELEASE: 5.14.0-39.el9.x86_64
     VERSION: #1 SMP PREEMPT Fri Dec 24 00:07:58 EST 2021
     MACHINE: x86_64  (2303 Mhz)
      MEMORY: 4 GB
       PANIC: "Oops: 0002 [#1] PREEMPT SMP NOPTI" (check log for details)
         PID: 22722
     COMMAND: "insmod"
        TASK: ffff973e88408000  [THREAD_INFO: ffff973e88408000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> set 1
>>>> sp:ffffa5ee40013d00 bp:ffff973efbc2c800 ip:ffffffffb726a1b3
>>>> sp:ffffa5ee40013d00 bp:ffff973efbc2c800 ip:ffffffffb726a1b3
crash> gdb bt
#0  0xffffffffb726a1b3 in context_switch (rf=0xffffa5ee40013d00, next=0xffff973e88408000, prev=0xffff973e801fb280, rq=0xffff973efbc2c800) at kernel/sched/core.c:4972
#1  __schedule (sched_mode=0) at kernel/sched/core.c:6253
#2  0x0004005300000004 in ?? ()
crash> bt
PID: 1        TASK: ffff973e801fb280  CPU: 0    COMMAND: "systemd"
 #0 [ffffa5ee40013d30] __schedule at ffffffffb726a1b3
 #1 [ffffa5ee40013d78] schedule at ffffffffb726a553
 #2 [ffffa5ee40013d90] schedule_hrtimeout_range_clock at ffffffffb726f294
 #3 [ffffa5ee40013e10] ep_poll at ffffffffb6bc00c4
 #4 [ffffa5ee40013eb0] do_epoll_wait at ffffffffb6bc01db
 #5 [ffffa5ee40013ee8] __x64_sys_epoll_wait at ffffffffb6bc09b0
 #6 [ffffa5ee40013f38] do_syscall_64 at ffffffffb725ea98
 #7 [ffffa5ee40013f50] entry_SYSCALL_64_after_hwframe at ffffffffb740007c

The stack unwinding failed for "gdb bt", it only unwinded the "__schedule" function. The similarities for the failing is rsp and rbp which got from stack pointing to different stack frames:
>>>> sp:ffffa5ee40013d00 bp:ffff973efbc2c800 ip:ffffffffb726a1b3
crash> rd ffffa5ee40013d00 32
ffffa5ee40013d00:  ffff973ef8eef4a8 0000000000000000   ....>...........  r15, r14 (struct inactive_task_frame)
ffffa5ee40013d10:  ffff973e88408000 ffff973e801fb280   ..@.>.......>...  r13, r12
ffffa5ee40013d20:  ffff973e801fbd98 ffff973efbc2c800   ....>.......>...  bx, bp
ffffa5ee40013d30:  ffffffffb726a1b3 ffff973e00000001   ..&.........>...  ret_addr
ffffa5ee40013d40:  0004005300000004 f915b72f0f356b00   ....S....k5./...
ffffa5ee40013d50:  ffff973e801fb280 ffff973e801fb280   ....>.......>...
ffffa5ee40013d60:  ffff973e801fb280 ffff973ef8eef4e0   ....>.......>...
ffffa5ee40013d70:  0000000000000000 ffffffffb726a553   ........S.&.....
ffffa5ee40013d80:  ffff973ef8eef480 ffffa5ee40013e68   ....>...h>.@....
ffffa5ee40013d90:  ffffffffb726f294 ffffffffb6bbf092   ..&.............
ffffa5ee40013da0:  000055d6d4081de0 00000054b6b4c6fd   .....U......T...
ffffa5ee40013db0:  ffff973ef8eef4d0 ffffa5ee40013db8   ....>....=.@....
ffffa5ee40013dc0:  ffffa5ee40013db8 0000000000000000   .=.@............
ffffa5ee40013dd0:  ffff973e00000019 f915b72f0f356b00   ....>....k5./...
ffffa5ee40013de0:  f915b72f0f356b00 ffff973ef8eef480   .k5./.......>...
ffffa5ee40013df0:  ffffa5ee40013e68 ffff973e801fb280   h>.@........>...

And bp is modified as follows:

crash> dis schedule
0xffffffffb726a510 <schedule>:  nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffb726a515 <schedule+5>:        push   %rbp
0xffffffffb726a516 <schedule+6>:        mov    %gs:0x16f40,%rbp  <<< 
0xffffffffb726a51f <schedule+15>:       push   %rbx
0xffffffffb726a520 <schedule+16>:       mov    0x18(%rbp),%eax
0xffffffffb726a523 <schedule+19>:       test   %eax,%eax

I'm not sure why in this case gdb cannot get the stack unwinded. Other than this case, the x86_64 stack unwinding works fine according to my test.

2) May break the original ppc64 patch.

This x86_64 patch is based on the original v7 ppc stack unwinding patch. And it modified a bit of the original ppc64 patch code. I haven't fully tested the code in ppc64 arch.

Thanks,
Tao Liu


> 
> Thanks,
> Kazu
> 
> On 2024/01/05 16:30, Aditya Gupta wrote:
> > The Problem:
> > ============
> > 
> > Currently crash is unable to show function arguments and local variables, as
> > gdb can do. And functionality for moving between frames ('up'/'down') is not
> > working in crash.
> > 
> > Crash has 'gdb passthroughs' for things gdb can do, but the gdb passthroughs
> > 'bt', 'frame', 'info locals', 'up', 'down' are not working either, due to
> > gdb not getting the register values from `crash_target::fetch_registers`,
> > which then uses `machdep->get_cpu_reg`, which is not implemented for PPC64
> > 
> > Proposed Solution:
> > ==================
> > 
> > Fix the gdb passthroughs by implementing "machdep->get_cpu_reg" for PPC64.
> > This way, "gdb mode in crash" will support this feature for both ELF and
> > kdump-compressed vmcore formats, while "gdb" would only have supported ELF
> > format
> > 
> > This way other features of 'gdb', such as seeing
> > backtraces/registers/variables/arguments/local variables, moving up and
> > down stack frames, can be used with any ppc64 vmcore, irrespective of
> > being ELF format or kdump-compressed format.
> > 
> > Note: This doesn't support live debugging on ppc64, since registers are not
> > available to be read
> > 
> > Implications on Architectures:
> > ====================================
> > 
> > No architecture other than PPC64 has been affected, other than in case of
> > 'frame' command
> > 
> > As mentioned in patch #2, since frame will not be prohibited, so it will print:
> > 
> > 	crash> frame
> > 	#0  <unavailable> in ?? ()
> > 
> > Instead of before prohibited message:
> > 
> > 	crash> frame
> > 	crash: prohibited gdb command: frame
> > 
> > Major change will be in 'gdb mode' on PPC64, that it will print the frames, and
> > local variables, instead of failing with errors showing no frame, or showing
> > that couldn't get PC, it will be able to give all this information.
> > 
> > Testing:
> > ========
> > 
> > Git tree with this patch series applied:
> > https://github.com/adi-g15-ibm/crash/tree/stack-unwind-v6
> > 
> > To test various gdb passthroughs:
> > 
> > 	(crash) set
> > 	(crash) set gdb on
> > 	gdb> thread
> > 	gdb> bt
> > 	gdb> info threads
> > 	gdb> info threads
> > 	gdb> info locals
> > 	gdb> info variables irq_rover_lock
> > 	gdb> info args
> > 	gdb> thread 2
> > 	gdb> set gdb off
> > 	(crash) set
> > 	(crash) set -c 6
> > 	(crash) gdb thread
> > 	(crash) bt
> > 	(crash) gdb bt
> > 	(crash) frame
> > 	(crash) up
> > 	(crash) down
> > 	(crash) info locals
> > 
> > Known Issues:
> > =============
> > 
> > 1. In gdb mode, 'bt' might fail to show backtrace in few vmcores collected
> >     from older kernels. This is a known issue due to register mismatch, and
> >     its fix has been merged upstream:
> > 
> >     This can also cause some 'invalid kernel virtual address' errors during gdb
> >     unwinding the stack registers
> > 
> > Commit: https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef785819e72db79
> > 
> > Fixing GDB passthroughs on other architectures
> > ==============================================
> > 
> > Much of the work for making gdb passthroughs like 'gdb bt', 'gdb
> > thread', 'gdb info locals' etc. has been done by the patches introducing
> > 'machdep->get_cpu_reg' and this series fixing some issues in that.
> > 
> > Other architectures should be able to fix these gdb functionalities by
> > simply implementing 'machdep->get_cpu_reg (cpu, regno, ...)'.
> > 
> > The reasoning behind that has been explained with a diagram in commit
> > description of patch #1
> > 
> > I will assist with my findings/observations fixing it on ppc64 whenever needed.
> > 
> > Changelog:
> > ==========
> > 
> > V6:
> > + changes in patch #5: fix bug introduced in v5 that caused initial gdb thread
> >    to be thread 1
> > 
> > V5:
> > + changes in patch #1: made ppc64_get_cpu_reg static, and remove unreachable
> >    code
> > + changes in patch #3: fixed typo 'ppc64_renum' instead of 'ppc64_regnum',
> >    remove unneeded if condition
> > + changes in patch #5: implement refresh regcache on per thread, instead of all
> >    threads at once
> > 
> > V4:
> > + fix segmentation fault in live debugging (change in patch #1)
> > + mention live debugging not supported in cover letter and patch #1
> > + fixed some checkpatch warnings (change in patch #5)
> > 
> > V3:
> > + default gdb thread will be the crashing thread, instead of being
> >    thread '0'
> > + synchronise crash cpu and gdb thread context
> > + fix bug in gdb_interface, that replaced gdb's output stream, losing
> >    output in some cases, such as info threads and extra output in info
> >    variables
> > + fix 'info threads'
> > 
> > RFC V2:
> >    - removed patch implementing 'frame', 'up', 'down' in crash
> >    - updated the cover letter by removing the mention of those commands other
> > 	than the respective gdb passthrough
> > 
> > Aditya Gupta (5):
> >    ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg
> >    remove 'frame' from prohibited commands list
> >    synchronise cpu context changes between crash/gdb
> >    fix gdb_interface: restore gdb's output streams at end of
> >      gdb_interface
> >    fix 'info threads' command
> > 
> >   crash_target.c  |  44 ++++++++++++++++
> >   defs.h          | 130 +++++++++++++++++++++++++++++++++++++++++++++++-
> >   gdb-10.2.patch  | 110 +++++++++++++++++++++++++++++++++++++++-
> >   gdb_interface.c |   2 +-
> >   kernel.c        |  47 +++++++++++++++--
> >   ppc64.c         |  95 +++++++++++++++++++++++++++++++++--
> >   task.c          |  14 ++++++
> >   tools.c         |   2 +-
> >   8 files changed, 434 insertions(+), 10 deletions(-)
> > 
--
Crash-utility mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxxxxxx
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki




[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux