Re: Crash faults when determining panic task

Dave Anderson <anderson@xxxxxxxxxx> · Wed, 28 Sep 2011 17:14:31 -0400 (EDT)

Hi Joe,

It pretty clear it's due to this change in 5.1.5:

         - Implemented the capability of using the NT_PRSTATUS ELF note data
           that is saved in version 4 compressed kdump headers to determine the
           starting stack and instruction pointer hooks for x86 and x86_64
           backtraces when they cannot be determined in the traditional manners.
           (wang.chao@xxxxxxxxxxxxxx, wency@xxxxxxxxxxxxxx)

What happens if you run it like so:

  $ crash --no_elf_notes vmlinux vmcore

As far as this message:

  WARNING: sparsemem: invalid section number: 137438888923

That should be outside the realm of Fujitsu's ELF notes patch.  Does this kernel
have some kind of Stratus VM modification?

Dave

----- Original Message -----
> 
> Crash faults when determining panic task
> 
> I have a vmcore generated on RHEL6.1 that newer versions of crash
> have trouble analyzing (5.1.1-2.el6 seems to work ok) .
> 
> 
> 
> I can provide additional binary files if needed, just let me know
> what convention best suits the list (ftp, private email attachment,
> etc.)
> 
> 
> 
> Crash Version : OS: Result:
> 
> crash 5.1.8 Debian wheezy faults
> 
> crash 5.1.7-1.el6 RHEL6.2 Alpha faults
> 
> crash 5.1.1-2.el6 RHEL6.1 ok
> 
> 
> Kernel:
> 
> 2.6.32-131.0.15.el6.exp10.bz16586.x86_64 ( 2.6.32-131.0.15 + a fix
> for Red Hat bz - 707268)
> 
> 
> Interesting warnings when starting crash:
> 
> WARNING: sparsemem: invalid section number: 137438888923
> 
> WARNING: sparsemem: invalid section number: 137438888923
> 
> 
> First fault, null pointer deference:
> 
> please wait... (determining panic task)
> 
> Program received signal SIGSEGV, Segmentation fault.
> 
> x86_64_get_dumpfile_stack_frame (rsp=0x7fffffffcc58,
> rip=0x7fffffffcc50,
> 
> bt_in=0x7fffffffcce0) at x86_64.c:4183
> 
> 4183 ur_rip = ULONG(user_regs +
> 
> (gdb) p user_regs
> 
> $1 = 0x0
> 
> 
> Workaround, check that bt->machdep is not NULL:
> 
> diff -Nupr crash-5.1.8/x86_64.c crash-5.1.8.new/x86_64.c
> 
> --- crash-5.1.8/x86_64.c 2011-09-16 15:01:12.000000000 -0400
> 
> +++ crash-5.1.8.new/x86_64.c 2011-09-28 14:12:45.347188571 -0400
> 
> @@ -4178,7 +4178,7 @@ x86_64_get_dumpfile_stack_frame(struct b
> 
> goto skip_stage;
> 
> }
> 
> }
> 
> - } else if (ELF_NOTES_VALID()) {
> 
> + } else if (ELF_NOTES_VALID() && bt->machdep) {
> 
> user_regs = bt->machdep;
> 
> ur_rip = ULONG(user_regs +
> 
> OFFSET(user_regs_struct_rip));
> 
> 
> Second fault, a curiously large n_descsz in elf note header:
> 
> please wait... (determining panic task)
> 
> Program received signal SIGSEGV, Segmentation fault.
> 
> get_regs_from_note (note=0xd26472 "\b", ip=0x7fffffffc4e0,
> sp=0x7fffffffc4e8)
> 
> at netdump.c:2221
> 
> 2221 *sp = ULONG(user_regs + offset_sp);
> 
> (gdb) p *(Elf64_Nhdr *)note
> 
> $1 = {n_namesz = 8, n_descsz = 3438804992, n_type = 8}
> 
> 
> Workaround, do not attempt reading registers from elf notes (this
> chunk of code was not present in crash 5.1.1):
> 
> diff -Nupr crash-5.1.8/netdump.c crash-5.1.8.new/netdump.c
> 
> --- crash-5.1.8/netdump.c 2011-09-16 15:01:12.000000000 -0400
> 
> +++ crash-5.1.8.new/netdump.c 2011-09-28 14:14:43.687183734 -0400
> 
> @@ -2286,7 +2286,7 @@ get_netdump_regs_x86_64(struct bt_info *
> 
> 
> 
> bt->machdep = (void *)user_regs;
> 
> }
> 
> -
> 
> +#if 0
> 
> if (ELF_NOTES_VALID() &&
> 
> (bt->flags & BT_DUMPFILE_SEARCH) && DISKDUMP_DUMPFILE() &&
> 
> (note = (Elf64_Nhdr *)
> 
> @@ -2305,7 +2305,7 @@ get_netdump_regs_x86_64(struct bt_info *
> 
> 
> 
> bt->machdep = (void *)user_regs;
> 
> }
> 
> -
> 
> +#endif
> 
> machdep->get_stack_frame(bt, ripp, rspp); }
> 
> 
> Given the warning messages at the beginning of the process, I'm sure
> if I' m dealing with a corrupted or incomplete vmcore image. Let me
> know what additional info could be useful if this seems worth
> debugging further.
> 
> 
> 
> Thanks,
> 
> -- Joe Lawrence
> --
> Crash-utility mailing list
> Crash-utility@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/crash-utility
> 

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility