Re: [RFC] Patch to use gdb's bt in crash

Maneesh Soni <maneesh@xxxxxxxxxx> · Thu, 24 Aug 2006 21:50:06 +0530

On Thu, Aug 24, 2006 at 09:15:34AM -0400, Dave Anderson wrote:
> Rachita Kothiyal wrote:
> 
> > Hi Dave
> >
> > I was trying to implement better backtrace mechanism for crash using
> > dwarf info. And was trying to use the embedded gdb itself as gdb
> > already uses dwarf information for unwinding stack. I could get
> > "gdb bt" command working in "crash" after making one minor bug
> > fix in gdb_interface.c (Patch appended). Now one can get cleaner
> > backtrace particularly in x86_64 case using "gdb bt" command.
> >
> 
> Wow -- your definition of "cleaner" apparently is different than mine...  ;-)
> 

Looks are sometimes deceptive ;-).. Using gdb's stack winding code, the
unwanted stack frames (like frames # 4, 6, 7, 8) are avoided.

> >
> > crash> bt
> > PID: 4146   TASK: ffff81022e848af0  CPU: 0   COMMAND: "insmod"
> >  #0 [ffff81021efadbf8] crash_kexec at ffffffff801521d1
> >  #1 [ffff81021efadc40] machine_kexec at ffffffff8011a739
> >  #2 [ffff81021efadc80] crash_kexec at ffffffff801521ed
> >  #3 [ffff81021efadd08] crash_kexec at ffffffff801521d1
> >  #4 [ffff81021efadd30] bust_spinlocks at ffffffff8011fd6d
> >  #5 [ffff81021efadd40] panic at ffffffff80131422
> >  #6 [ffff81021efadda0] cond_resched at ffffffff804176c3
> >  #7 [ffff81021efaddb0] wait_for_completion at ffffffff80417701
> >  #8 [ffff81021efade00] __down_read at ffffffff80418d07
> >  #9 [ffff81021efade30] fun2 at ffffffff80107017
> > #10 [ffff81021efade40] fun1 at ffffffff801311b6
> > #11 [ffff81021efade50] init_module at ffffffff8800200f
> > #12 [ffff81021efade60] sys_init_module at ffffffff8014c664
> > #13 [ffff81021efadf00] init_module at ffffffff88002068
> > #14 [ffff81021efadf80] system_call at ffffffff801096da
> >     RIP: 00002b2153382d4a  RSP: 00007fff57900a28  RFLAGS: 00010246
> >     RAX: 00000000000000af  RBX: ffffffff801096da  RCX: 0000000000000000
> >     RDX: 0000000000512010  RSI: 0000000000016d26  RDI: 00002b21531e5010
> >     RBP: 00007fff57900c58   R8: 00002b21534f46d0   R9: 00002b21531fbd36
> >     R10: 0000000000516040  R11: 0000000000000206  R12: 0000000000512010
> >     R13: 00007fff579015c5  R14: 0000000000000000  R15: 00002b21531e5010
> >     ORIG_RAX: 00000000000000af  CS: 0033  SS: 002b
> > crash> gdb bt 15
> > [Switching to thread 1 (process 4146)]#0  0xffffffff801521d1 in crash_kexec (regs=0x0) at kexec.h:64
> > 64      in kexec.h
> > #0  0xffffffff801521d1 in crash_kexec (regs=0x0) at kexec.h:64
> > #1  0xffffffff80131422 in panic (fmt=0xffffffff8044832c "Rachita triggering panic\n") at kernel/panic.c:87
> > #2  0xffffffff80107017 in fun2 (i=0) at init/main.c:608
> > #3  0xffffffff801311b6 in fun1 (j=Variable "j" is not available.
> > ) at kernel/panic.c:278
> > #4  0xffffffff8800200f in ?? ()
> > #5  0xffffc2000023d9d0 in ?? ()
> > #6  0xffffffff8014c664 in sys_init_module (umod=0xffff81022ef6c400, len=18446604445110683424,
> >     uargs=0xffff81022ef6c6e8 "\020304366.\002\201377377x304366.\002\201377377340304366.\002\201377377H305366.\002\201377377260305366.\002\201377377\030306366.\002\201377377\200306366.\002\201377377")
> >     at kernel/module.c:1911
> > #7  0xffffffff801096da in system_call () at bitops.h:230
> > #8  0x00002b2153382d4a in ?? ()
> > #9  0xffff81022e8516d0 in ?? ()
> > #10 0xffffffff8055c7c0 in migration_notifier ()
> > #11 0x0000000000000000 in ?? ()
> > #12 0x0000000000000001 in ?? ()
> > #13 0xffffffffffffffff in ?? ()
> > #14 0xffffffff8013ae2a in recalc_sigpending () at kernel/signal.c:227
> > (More stack frames follow...)
> > crash>
> >
> > ===============================================================================
> >
> > But as of now there are few issues with "gdb bt"
> >
> > 1) Sometimes the no. of stack frames displayed doesn't end for a long time
> >    and also the "q" command doesn't work as desired once the screen is full.
> >    The workaround is to give some limiting count like "gdb bt 10".
> >    I tried gdb ver 6.1 externally (outside crash) also and see the same
> >    long ending stack frames where as the latest gdb (ver 6.4), works fine. So
> >    just wondering if you are planning to upgrade embedded gdb to ver 6.4?
> >
> 
> Not really.  That's a major undertaking with unpredictable results
> until it's attempted.  Every time I do that, nightmares follow, so only
> if we get to the point where gdb-6.1 doesn't work at all, or cripples
> crash's use of it with a new vmlinux, should we even think of doing that.
> 
> 
> >
> > 2) As unlike crash, there is no concept of tasks in gdb, we can only see the
> >    backtraces for tasks active at the time of crash.
> >
> >
> > Apart from "bt" this change also allows to get some other related commands
> > like "gdb info registers", "gdb info frame" and "gdb info threads" working.
> >
> 
> Well, right off the bat, I'm not too keen on passing the vmcore to gdb,
> because I don't know what the unseen ramifications of that would be.
> Even so, you can't just do an "argc++" in gdb_main_loop() because
> that apparently presumes that crash is receiving *only* two arguments,
> in the "vmlinux vmcore" order.  That cannot be presumed obviously,
> as the possible combinations of crash command line options/ordering
> are endless.
> 
In anycase, currently gdb_main_loop() is not passing the right "argc"
to gdb. 

> Secondly, until I see something useful in the case where the kernel
> takes an in-kernel exception that in turn causes the crash, I'm
> unconvinced.  What does the trace look like if you take an
> oops or BUG() while running in kernel mode?  Does gdb step
> past that point?  (i.e., to the part of the backtrace we'd actually
> want to see)  Certainly we won't see a register dump at the exact
> point of the exception.  Would it make the jump from the x86_64
> interrupt stack (or any of the exception stacks) back to the
> process stack?
> 

Rachita, could you please test the patch which such dump also?

> Given that it only gives backtraces of the active tasks, we're
> still left with a half-baked implementation.
> 
> And now, with the introduction of the new CONFIG_UNWIND_INFO
> and CONFIG_STACK_UNWIND configurations in the x86 and x86_64
> kernels, wouldn't it make more sense to utilize the approach taken by
> the crash-utility/ia64 unwind facility?  Although the x86/x86_64
> implementation still appears to be a work in progress in the kernel,
> backporting that capability from the kernel to user-space would seem
> to be more useful.  That's what was done for ia64, and for that reason
> it's the only architecture where we get dependable backtraces for
> all tasks, active or not.
> 

IIUC, this will involve writing dwarf support code for crash as a whole.

> Simple question -- and to be quite honest with you -- I don't
> understand why you wouldn't want to simply use gdb alone
> in this case?
> 
> 
Instead of writing stack unwinding code using dwarf info from 
scratch, gdb's code was re-used.

Thanks
Maneesh

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility