[Crash-utility] Re: [PATCH v4 07/16] Stop stack unwinding at non-kernel address

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Lianbo,

On 24/06/24 05:32PM, lijiang wrote:
> > <...snip...>
> >
> > Before:
> > crash> gdb bt
> >  #0  0xffffffff816a8f65 in context_switch ...
> >  #1  __schedule () ...
> >  #2  0xffffffff816a94e9 in schedule ...
> >  #3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
> >  #4  0xffffffff816a8733 in schedule_hrtimeout_range ...
> >  #5  0xffffffff8124bb7e in ep_poll ...
> >  #6  0xffffffff8124d00d in SYSC_epoll_wait ...
> >  #7  SyS_epoll_wait ...
> >  #8  <signal handler called>
> >  #9  0x00007f0449407923 in ?? ()
> >  #10 0xffff880100000001 in ?? ()
> >  #11 0xffff880169b3c010 in ?? ()
> >  #12 0x0000000000000040 in irq_stack_union ()
> >  #13 0xffff880169b3c058 in ?? ()
> >  #14 0xffff880169b3c048 in ?? ()
> >  #15 0xffff880169b3c050 in ?? ()
> >  #16 0x0000000000000000 in ?? ()
> >
> > After:
> > crash> gdb bt
> >  #0  0xffffffff816a8f65 in context_switch ...
> >  #1  __schedule () ...
> >  #2  0xffffffff816a94e9 in schedule () ...
> >  #3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
> >  #4  0xffffffff816a8733 in schedule_hrtimeout_range ...
> >  #5  0xffffffff8124bb7e in ep_poll ...
> >  #6  0xffffffff8124d00d in SYSC_epoll_wait ...
> >  #7  SyS_epoll_wait ...
> >  #8  <signal handler called>
> >  #9  0x00007f0449407923 in ?? ()
> >
> >
> It seems that there are still some non-kernel addresses that do not get
> filtered. Can you help double check?
>
> 
> For example:
> 
> crash> gdb bt
> #0  crash_setup_regs (newregs=0xffffb5bb4f197938, oldregs=0x0) at
> ./arch/x86/include/asm/kexec.h:114
> #1  0xffffffff8e61e32e in __crash_kexec (regs=regs@entry=0x0) at
> kernel/crash_core.c:122
> #2  0xffffffff8e51a64d in panic (fmt=fmt@entry=0xffffffff8fa51609 "sysrq
> triggered crash\n") at kernel/panic.c:366
> #3  0xffffffff8ec21f86 in sysrq_handle_crash (key=<optimized out>) at
> drivers/tty/sysrq.c:154
> #4  0xffffffff8ec22550 in __handle_sysrq (key=<optimized out>,
> check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:612
> #5  0xffffffff8ec22bf5 in write_sysrq_trigger (file=<optimized out>,
> buf=<optimized out>, count=2, ppos=<optimized out>) at
> drivers/tty/sysrq.c:1183
> #6  0xffffffff8e935ae5 in pde_write (ppos=<optimized out>, count=<optimized
> out>, buf=<optimized out>, file=0xffffb5bb4f197938, pde=0xffff98338b78e0c0)
> at fs/proc/inode.c:334
> #7  proc_reg_write (file=0xffffb5bb4f197938, buf=0x0, count=1, ppos=0x0) at
> fs/proc/inode.c:346
> #8  0xffffffff8e88d382 in vfs_write (file=file@entry=0xffff98338b789200,
> buf=buf@entry=0x5614d58a22c0 <error: Cannot access memory at address
> 0x5614d58a22c0>, count=count@entry=2, pos=pos@entry=0xffffb5bb4f197b78) at
> fs/read_write.c:588
> #9  0xffffffff8e88d9ff in ksys_write (fd=<optimized out>,
> buf=0x5614d58a22c0 <error: Cannot access memory at address 0x5614d58a22c0>,
> count=2) at fs/read_write.c:643
> #10 0xffffffff8f124429 in do_syscall_x64 (nr=1, regs=0xffffb5bb4f197f58) at
> arch/x86/entry/common.c:52
> #11 do_syscall_64 (regs=0xffffb5bb4f197f58, nr=1) at
> arch/x86/entry/common.c:83
> #12 0xffffffff8f20012b in entry_SYSCALL_64 () at
> arch/x86/entry/entry_64.S:121
> #13 0x00007f9a147f69e0 in ?? ()
> 
> The frame #13 looks like a non-kernel address.

True. Though it seems to be okay for it to print the last frame with a
non-kernel address, as in this snippet from gdb:

    for (fi = trailing; fi && count--; fi = get_prev_frame (fi))
	...
	  print_frame_info (fp_opts, fi, 1, LOCATION, 1, 0);

Seems that frame #13, fi was not NULL.

Seeing Tao's change, it compares the current frame's NIP/PC to see if
it should return NULL (which I think is nice and works). Here the
'this_frame' would have been frame 12, (which would have called
`'get_prev_frame' to get the frame 13)

```
  frame_pc_p = get_frame_pc_if_available (this_frame, &frame_pc);
#ifdef CRASH_MERGE
  if (!is_kvaddr(frame_pc)) {
        return NULL;
  }
#endif
```

Tao's condition will hit when 'get_prev_frame(this_frame=frame#13)' will
be called to get the frame #14, which will return NULL and hence break
out of the loop.

This is based on what I recall and a quick look at the implementation,
please feel free to correct Lianbo/Tao.

Thanks,
Aditya Gupta


> 
> 
> Thanks
> Lianbo
> 
> 
> > Cc: Sourabh Jain <sourabhjain@xxxxxxxxxxxxx>
> > Cc: Hari Bathini <hbathini@xxxxxxxxxxxxx>
> > Cc: Mahesh J Salgaonkar <mahesh@xxxxxxxxxxxxx>
> > Cc: Naveen N. Rao <naveen.n.rao@xxxxxxxxxxxxxxxxxx>
> > Cc: Lianbo Jiang <lijiang@xxxxxxxxxx>
> > Cc: HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@xxxxxxx>
> > Cc: Tao Liu <ltao@xxxxxxxxxx>
> > Cc: Alexey Makhalov <alexey.makhalov@xxxxxxxxxxxx>
> > Signed-off-by: Tao Liu <ltao@xxxxxxxxxx>
> > ---
> >  defs.h          |  1 +
> >  gdb-10.2.patch  | 26 ++++++++++++++++++++++++++
> >  gdb_interface.c |  6 ++++++
> >  3 files changed, 33 insertions(+)
> >
> > diff --git a/defs.h b/defs.h
> > index 012ffdc..c0e6a29 100644
> > --- a/defs.h
> > +++ b/defs.h
> > @@ -7902,6 +7902,7 @@ extern unsigned char *gdb_prettyprint_arrays;
> >  extern unsigned int *gdb_repeat_count_threshold;
> >  extern unsigned char *gdb_stop_print_at_null;
> >  extern unsigned int *gdb_output_radix;
> > +int is_kvaddr(ulong);
> >
> >  /*
> >   *  gdb/top.c
> > diff --git a/gdb-10.2.patch b/gdb-10.2.patch
> > index 0bed96a..3ed40c0 100644
> > --- a/gdb-10.2.patch
> > +++ b/gdb-10.2.patch
> > @@ -16171,3 +16171,29 @@ exit 0
> >   }
> >
> >   /*
> > +--- gdb-10.2/gdb/frame.c.orig
> > ++++ gdb-10.2/gdb/frame.c
> > +@@ -2331,6 +2331,10 @@ inside_entry_func (frame_info *this_frame)
> > +    This function should not contain target-dependent tests, such as
> > +    checking whether the program-counter is zero.  */
> > +
> > ++#ifdef CRASH_MERGE
> > ++extern "C" int is_kvaddr(ulong);
> > ++#endif
> > ++
> > + struct frame_info *
> > + get_prev_frame (struct frame_info *this_frame)
> > + {
> > +@@ -2353,7 +2357,11 @@ get_prev_frame (struct frame_info *this_frame)
> > +     get_frame_id (this_frame);
> > +
> > +   frame_pc_p = get_frame_pc_if_available (this_frame, &frame_pc);
> > +-
> > ++#ifdef CRASH_MERGE
> > ++  if (!is_kvaddr(frame_pc)) {
> > ++        return NULL;
> > ++  }
> > ++#endif
> > +   /* tausq/2004-12-07: Dummy frames are skipped because it doesn't make
> > much
> > +      sense to stop unwinding at a dummy frame.  One place where a dummy
> > +      frame may have an address "inside_main_func" is on HPUX.  On HPUX,
> > the
> > diff --git a/gdb_interface.c b/gdb_interface.c
> > index b13d5fd..e76ecc6 100644
> > --- a/gdb_interface.c
> > +++ b/gdb_interface.c
> > @@ -947,6 +947,12 @@ gdb_lookup_module_symbol(ulong addr, ulong *offset)
> >         }
> >  }
> >
> > +int
> > +is_kvaddr(ulong addr)
> > +{
> > +       return IS_KVADDR(addr);
> > +}
> > +
> >  /*
> >   *  Used by gdb_interface() to catch gdb-related errors, if desired.
> >   */
> > --
> > 2.40.1
> >
--
Crash-utility mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxxxxxx
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki




[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux