Re: [RFC PATCH v2 0/4] Improve stack unwind on ppc64

Aditya Gupta <adityag@xxxxxxxxxxxxx> · Mon, 4 Sep 2023 17:08:37 +0530

Hi Lijiang,
Thanks for the reviews

> > Implications on Architectures:
> > ====================================
> >
> > No architecture other than PPC64 has been affected, other than in case of
> > 'frame' command
> >
> >
> BTW: Can this feature be implemented on other architectures such as X86 64,
> etc? Have you investigated?
> 

Yes Lijiang, from my analysis, the key functionality is provide register value
to gdb, for any architecture

Basically, gdb needs register values for 'frame', 'bt', which it gets from the
target, ie. `crash_target::fetch_register`, which further calls
`machdep->get_cpu_reg` to get each register
And then with the register values and dwarf info, gdb can print backtraces, and
local variables

I did not do it, but logically that might help on x86_64.
I have put a diagram in patch #2.

> > Git tree with this patch series applied:
> > https://github.com/adi-g15-ibm/crash/tree/stack-unwind-rfc2
> >
> > To test gdb passthroughs:
> >
> >         crash> set gdb on
> >         gdb> thread 3 # or any other thread number to change context in gdb
> >         gdb> bt
> >         gdb> frame
> >         gdb> up
> >         gdb> down
> >         gdb> info locals
> >
> >
> I did a simple test as below(kernel commit: 99d99825fc07):
> 
> gdb> info threads
>   Id   Target Id         Frame
>   1    CPU 0             <unavailable> in ?? ()
>   2    CPU 1
> gdb> thread 2
> [Switching to thread 2 (CPU 1)]
> #0  0xc0000000002843f8 in crash_setup_regs (oldregs=<optimized out>,
> newregs=0xc00000003dbd7958) at ./arch/powerpc/include/asm/kexec.h:69
> 69                      ppc_save_regs(newregs);
> gdb> bt
> #0  0xc0000000002843f8 in crash_setup_regs (oldregs=<optimized out>,
> newregs=0xc00000003dbd7958) at ./arch/powerpc/include/asm/kexec.h:69
> #1  __crash_kexec (regs=<optimized out>) at kernel/kexec_core.c:1064
> #2  0xc00000000014e018 in panic (fmt=0xc000000001443d80 "sysrq triggered
> crash\n") at kernel/panic.c:359
> #3  0xc0000000009b8978 in sysrq_handle_crash (key=<optimized out>) at
> drivers/tty/sysrq.c:155
> #4  0xc0000000009b946c in __handle_sysrq (key=key@entry=99,
> check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:602
> #5  0xc0000000009b9ce8 in write_sysrq_trigger (file=<optimized out>,
> buf=<optimized out>, count=2, ppos=<optimized out>) at
> drivers/tty/sysrq.c:1163
> #6  0xc0000000006919fc in pde_write (ppos=<optimized out>, count=<optimized
> out>, buf=<optimized out>, file=<optimized out>, pde=0xc00000000556fcc0) at
> fs/proc/inode.c:340
> #7  proc_reg_write (file=<optimized out>, buf=<optimized out>,
> count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:352
> #8  0xc0000000005b7cb8 in vfs_write (file=file@entry=0xc000000036fa5f00,
> buf=buf@entry=0x10027835560 <error: Cannot access memory at address
> 0x10027835560>, count=count@entry=2, pos=pos@entry=0xc00000003dbd7de0) at
> fs/read_write.c:582
> #9  0xc0000000005b83a4 in ksys_write (fd=<optimized out>, buf=0x10027835560
> <error: Cannot access memory at address 0x10027835560>, count=2) at
> fs/read_write.c:637
> #10 0xc000000000031454 in system_call_exception (regs=0xc00000003dbd7e80,
> r0=<optimized out>) at arch/powerpc/kernel/syscall.c:153
> #11 0xc00000000000cedc in system_call_vectored_common () at
> arch/powerpc/kernel/interrupt_64.S:198
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> gdb> frame 7
> #7  proc_reg_write (file=<optimized out>, buf=<optimized out>,
> count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:352
> 352                     rv = pde_write(pde, file, buf, count, ppos);
> gdb> info rv
> gdb: gdb request failed: info rv
> gdb>
> 
> Seems that the 'info locals' command is not working as expected. I haven't
> investigated the details.
> 

Oh, sure I will test this, meanwhile are you sure 'info rv' was the command you
ran, I guess that's not a valid gdb command, can you please check.

> 
> Known Issues:
> > =============
> >
> > 1. In gdb mode, 'info threads' might hang for few seconds, and print only 2
> >    threads
> >
> 
> Hmm, it only prints 2 threads, and one of which is unavailable on my side.
> Can you try to dig into the details?
> 

Yes, the long time is due to gdb trying lot of unwinders, to unwind frame in each
thread.
This happens in gdb mode, and does not affect the default crash mode in any way.
And without this patch set also, gdb mode didn't recognise actual threads, and
would just print n threads (n being number of cpus), since that's what was added
in crash_target_init

I have tried to fix this earlier, but failed to. The following is my
speculation, but this might have been caused due to crash explicitly having
registered threads in crash_target_init, and since not all threads will be
alive, gdb is trying to unwind them and failing, so it goes and tries all
unwinders, and still fail

> 
> > 2. In gdb mode, 'bt' might fail to show backtrace in few vmcores collected
> >    from older kernels. This is a known issue due to register mismatch, and
> >    its fix has been merged upstream:
> >
> > Commit:
> > https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef785819e72db79
> >
> > TODO:
> > =====
> >
> > 1. Introduce automatic thread selection in gdb mode, to select the crashing
> >    thread in gdb, eliminating the need to manually run "thread <id>" after
> >    switching to gdb mode.

Just a note here, I will include this in next version I send.

> >
> > Changelog:
> > ==========
> >
> > RFC V2:
> >   - removed patch implementing 'frame', 'up', 'down' in crash
> >   - updated the cover letter by removing the mention of those commands
> > other
> >         than the respective gdb passthrough
> >
> >
> In addition, the get_dumpfile_regs() is not invoked in the [patch 1], I
> would suggest moving it into the [patch 2]. Just a glance, I haven't looked
> at the patchset carefully.

Yeah, you are correct. That can be merged into patch #2. I had kept it separate
on purpose, so to introduce that logic separately as it was architecture
independent.

Thanks,
Aditya Gupta

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/crash-utility
Contribution Guidelines: https://github.com/crash-utility/crash/wiki