[Crash-utility] Re: [PATCH v6 0/5] Improve stack unwind on ppc64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 21, 2024 at 02:08:45PM +0800, Tao Liu wrote:
> Hi Aditya,
> 
> On Tue, Feb 20, 2024 at 2:30 PM Aditya Gupta <adityag@xxxxxxxxxxxxx> wrote:
> >
> > Hi Tao Liu,
> >
> > >
> > > <...snip...>
> > >
> > > > For now it seems to work fine on ppc64, I will go through the code
> > > > changes once. By the way, can you post a rebased patch series for x86 ?
> > >
> > > I noticed there is some problem with the ppc64 change thread context
> > > support. Currently for x86_64, we use "set <pid>" cmd to switch the
> > > current task context to be the task <pid>'s, then we use cmd "gdb bt"
> > > to view the stack unwinding of task <pid>, by which we can view
> > > arbitrary task's stack unwinding.
> > >
> > > However for ppc64 code, there is no "get arbitrary tasks register"
> > > code. E.g. in ppc64_get_cpu_reg():
> > >
> > > task = get_active_task(cpu);
> > > tc = task_to_context(task);
> > >
> > > So no matter how we set any pid, the registers are always the active
> > > tasks' register, instead of <pid> tasks' register.
> >
> > Hmm. True, I will check that, true, gdb as of now will get active tasks
> > registers only, as this feature was meant for that, since getting gdb to
> > assist for the crashing process was the priority.
> >
> > Setting context to arbitrary pid ('set <pid>') and backtrace of any
> > arbitrary pid works on ppc64, so there should be a way to get those
> > registers, i will go through the code once to check this.
> >
> > But that arbitrary backtrace will be a very good plus.
> >
> I made some code change[1], and currently it seems to work for stack
> unwinding for arbitrary tasks.
> 
> The usage is as follows:
> crash> set <pid>
> or
> crash> set <task>
> crash> gdb bt
> which will view the specific task stack unwinding.

Thanks, I tested it, works okay.
Actually on ppc64, for any task, eg. systemd, switching to it shows the
swapper process's backtrace even now. As of now, I applied your patch
above my series, thanks for the fixes for nip/sp cases, and freeing up
the buffer even when returning early.

    crash> ps
	PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS  COMM
	...
	1       0  35  c000000003a54b80  IN   0.0   176768    18432  systemd
	...
	crash> set 1
    PID: 0
    COMMAND: "swapper/35"
       TASK: c000000003a80000  (1 of 64)  [THREAD_INFO: c000000003a80000]
        CPU: 35
      STATE: TASK_RUNNING (ACTIVE)
    crash> set c000000003a54b80
        PID: 0
    COMMAND: "swapper/35"
       TASK: c000000003a80000  (1 of 64)  [THREAD_INFO: c000000003a80000]
        CPU: 35
      STATE: TASK_RUNNING (ACTIVE)
    crash> bt
    PID: 0        TASK: c000000003a80000  CPU: 35   COMMAND: "swapper/35"
     R0:  0000000000000000    R1:  c000000003be3d30    R2:  c000000001782500
	 ...
     #0 [c000000003be3d30] _end at c000000003be3d70  (unreliable)
     #1 [c000000003be3d90] dedicated_cede_loop at c000000000edfb50
     #2 [c000000003be3de0] cpuidle_enter_state at c000000000edf244
     #3 [c000000003be3e80] cpuidle_enter at c000000000b61ad0
     #4 [c000000003be3ec0] call_cpuidle at c0000000001c5a84
     #5 [c000000003be3ee0] do_idle at c0000000001cd9d0
     #6 [c000000003be3f60] cpu_startup_entry at c0000000001cdd04
     #7 [c000000003be3f90] start_secondary at c000000000064a34
     #8 [c000000003be3fe0] start_secondary_prolog at c00000000000d158

Thanks,
Aditya Gupta

> 
> > >
> > > It would be simple to get this fixed, however the more challenging
> > > part, which is difficult for me, is how to get the inactive tasks
> > > registers, So we can do similar things as in
> > > x86_64.c:x86_64_get_stack_frame() of
> > > https://github.com/liutgnu/crash-dev?
> > >
> > > I see there is a structure as task_struct -> thread_struct -> struct
> > > pt_regs *regs. But sometimes the regs will give NULL value.
> >
> > True, the registers are NULL atleast in case of the swapper processes.
> 
> It seems to me that, in order to support stack unwinding, we don't
> need to get all registers, just nip & r1 (pc reg & stack pointer reg)
> are enough for gdb unwinding. Anyway, could you please give the trial
> patch a try, and see if it can work for you?
> 
> Thanks,
> Tao Liu
> 
> [1]: https://github.com/liutgnu/crash-dev/commit/ef1cec3400bd7619ed9d3c9c4b50a0a613f95b55
> 
> >
> > I will update on my findings.
> >
> > Thanks,
> > Aditya Gupta
> >
> > >
> > > Thanks,
> > > Tao Liu
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > >
> > > > Thanks,
> > > > Aditya Gupta
> > > >
> > > > >
> > > > > Thanks,
> > > > > Tao Liu
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Kazu
> > > > > >
> > > > > > On 2024/01/05 16:30, Aditya Gupta wrote:
> > > > > > > The Problem:
> > > > > > > ============
> > > > > > >
> > > > > > > Currently crash is unable to show function arguments and local variables, as
> > > > > > > gdb can do. And functionality for moving between frames ('up'/'down') is not
> > > > > > > working in crash.
> > > > > > >
> > > > > > > Crash has 'gdb passthroughs' for things gdb can do, but the gdb passthroughs
> > > > > > > 'bt', 'frame', 'info locals', 'up', 'down' are not working either, due to
> > > > > > > gdb not getting the register values from `crash_target::fetch_registers`,
> > > > > > > which then uses `machdep->get_cpu_reg`, which is not implemented for PPC64
> > > > > > >
> > > > > > > Proposed Solution:
> > > > > > > ==================
> > > > > > >
> > > > > > > Fix the gdb passthroughs by implementing "machdep->get_cpu_reg" for PPC64.
> > > > > > > This way, "gdb mode in crash" will support this feature for both ELF and
> > > > > > > kdump-compressed vmcore formats, while "gdb" would only have supported ELF
> > > > > > > format
> > > > > > >
> > > > > > > This way other features of 'gdb', such as seeing
> > > > > > > backtraces/registers/variables/arguments/local variables, moving up and
> > > > > > > down stack frames, can be used with any ppc64 vmcore, irrespective of
> > > > > > > being ELF format or kdump-compressed format.
> > > > > > >
> > > > > > > Note: This doesn't support live debugging on ppc64, since registers are not
> > > > > > > available to be read
> > > > > > >
> > > > > > > Implications on Architectures:
> > > > > > > ====================================
> > > > > > >
> > > > > > > No architecture other than PPC64 has been affected, other than in case of
> > > > > > > 'frame' command
> > > > > > >
> > > > > > > As mentioned in patch #2, since frame will not be prohibited, so it will print:
> > > > > > >
> > > > > > >   crash> frame
> > > > > > >   #0  <unavailable> in ?? ()
> > > > > > >
> > > > > > > Instead of before prohibited message:
> > > > > > >
> > > > > > >   crash> frame
> > > > > > >   crash: prohibited gdb command: frame
> > > > > > >
> > > > > > > Major change will be in 'gdb mode' on PPC64, that it will print the frames, and
> > > > > > > local variables, instead of failing with errors showing no frame, or showing
> > > > > > > that couldn't get PC, it will be able to give all this information.
> > > > > > >
> > > > > > > Testing:
> > > > > > > ========
> > > > > > >
> > > > > > > Git tree with this patch series applied:
> > > > > > > https://github.com/adi-g15-ibm/crash/tree/stack-unwind-v6
> > > > > > >
> > > > > > > To test various gdb passthroughs:
> > > > > > >
> > > > > > >   (crash) set
> > > > > > >   (crash) set gdb on
> > > > > > >   gdb> thread
> > > > > > >   gdb> bt
> > > > > > >   gdb> info threads
> > > > > > >   gdb> info threads
> > > > > > >   gdb> info locals
> > > > > > >   gdb> info variables irq_rover_lock
> > > > > > >   gdb> info args
> > > > > > >   gdb> thread 2
> > > > > > >   gdb> set gdb off
> > > > > > >   (crash) set
> > > > > > >   (crash) set -c 6
> > > > > > >   (crash) gdb thread
> > > > > > >   (crash) bt
> > > > > > >   (crash) gdb bt
> > > > > > >   (crash) frame
> > > > > > >   (crash) up
> > > > > > >   (crash) down
> > > > > > >   (crash) info locals
> > > > > > >
> > > > > > > Known Issues:
> > > > > > > =============
> > > > > > >
> > > > > > > 1. In gdb mode, 'bt' might fail to show backtrace in few vmcores collected
> > > > > > >     from older kernels. This is a known issue due to register mismatch, and
> > > > > > >     its fix has been merged upstream:
> > > > > > >
> > > > > > >     This can also cause some 'invalid kernel virtual address' errors during gdb
> > > > > > >     unwinding the stack registers
> > > > > > >
> > > > > > > Commit: https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef785819e72db79
> > > > > > >
> > > > > > > Fixing GDB passthroughs on other architectures
> > > > > > > ==============================================
> > > > > > >
> > > > > > > Much of the work for making gdb passthroughs like 'gdb bt', 'gdb
> > > > > > > thread', 'gdb info locals' etc. has been done by the patches introducing
> > > > > > > 'machdep->get_cpu_reg' and this series fixing some issues in that.
> > > > > > >
> > > > > > > Other architectures should be able to fix these gdb functionalities by
> > > > > > > simply implementing 'machdep->get_cpu_reg (cpu, regno, ...)'.
> > > > > > >
> > > > > > > The reasoning behind that has been explained with a diagram in commit
> > > > > > > description of patch #1
> > > > > > >
> > > > > > > I will assist with my findings/observations fixing it on ppc64 whenever needed.
> > > > > > >
> > > > > > > Changelog:
> > > > > > > ==========
> > > > > > >
> > > > > > > V6:
> > > > > > > + changes in patch #5: fix bug introduced in v5 that caused initial gdb thread
> > > > > > >    to be thread 1
> > > > > > >
> > > > > > > V5:
> > > > > > > + changes in patch #1: made ppc64_get_cpu_reg static, and remove unreachable
> > > > > > >    code
> > > > > > > + changes in patch #3: fixed typo 'ppc64_renum' instead of 'ppc64_regnum',
> > > > > > >    remove unneeded if condition
> > > > > > > + changes in patch #5: implement refresh regcache on per thread, instead of all
> > > > > > >    threads at once
> > > > > > >
> > > > > > > V4:
> > > > > > > + fix segmentation fault in live debugging (change in patch #1)
> > > > > > > + mention live debugging not supported in cover letter and patch #1
> > > > > > > + fixed some checkpatch warnings (change in patch #5)
> > > > > > >
> > > > > > > V3:
> > > > > > > + default gdb thread will be the crashing thread, instead of being
> > > > > > >    thread '0'
> > > > > > > + synchronise crash cpu and gdb thread context
> > > > > > > + fix bug in gdb_interface, that replaced gdb's output stream, losing
> > > > > > >    output in some cases, such as info threads and extra output in info
> > > > > > >    variables
> > > > > > > + fix 'info threads'
> > > > > > >
> > > > > > > RFC V2:
> > > > > > >    - removed patch implementing 'frame', 'up', 'down' in crash
> > > > > > >    - updated the cover letter by removing the mention of those commands other
> > > > > > >   than the respective gdb passthrough
> > > > > > >
> > > > > > > Aditya Gupta (5):
> > > > > > >    ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg
> > > > > > >    remove 'frame' from prohibited commands list
> > > > > > >    synchronise cpu context changes between crash/gdb
> > > > > > >    fix gdb_interface: restore gdb's output streams at end of
> > > > > > >      gdb_interface
> > > > > > >    fix 'info threads' command
> > > > > > >
> > > > > > >   crash_target.c  |  44 ++++++++++++++++
> > > > > > >   defs.h          | 130 +++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > > >   gdb-10.2.patch  | 110 +++++++++++++++++++++++++++++++++++++++-
> > > > > > >   gdb_interface.c |   2 +-
> > > > > > >   kernel.c        |  47 +++++++++++++++--
> > > > > > >   ppc64.c         |  95 +++++++++++++++++++++++++++++++++--
> > > > > > >   task.c          |  14 ++++++
> > > > > > >   tools.c         |   2 +-
> > > > > > >   8 files changed, 434 insertions(+), 10 deletions(-)
> > > > > > >
> > > > >
> > > >
> > >
> >
> 
--
Crash-utility mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxxxxxx
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki




[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux