Dave, Adding --no_elf_notes to the crash invocation does indeed start crash with without issue. Do you think that I am dealing with a corrupted/incomplete vmcore (as evident in that extremely large n_descsz value) or is this a bug that crash could more gracefully handle? As far as the kernel is concerned, 2.6.32-131.0.15.el6.exp10.bz16586.x86_64 was a stock RH 2.6.32-131.0.15 with an added patch for handling an MD Raid bug (RHBZ-707268). Stratus does load a driver to track dirty VM pages for harvesting purposes, but does not change general VM behavior. FWIW, this is the only vmcore that I've seen ELF note faulting or invalid section numbers. Thanks, -- Joe -----Original Message----- From: crash-utility-bounces@xxxxxxxxxx [mailto:crash-utility-bounces@xxxxxxxxxx] On Behalf Of Dave Anderson Sent: Wednesday, September 28, 2011 5:15 PM To: Discussion list for crash utility usage,maintenance and development Subject: Re: Crash faults when determining panic task Hi Joe, It pretty clear it's due to this change in 5.1.5: - Implemented the capability of using the NT_PRSTATUS ELF note data that is saved in version 4 compressed kdump headers to determine the starting stack and instruction pointer hooks for x86 and x86_64 backtraces when they cannot be determined in the traditional manners. (wang.chao@xxxxxxxxxxxxxx, wency@xxxxxxxxxxxxxx) What happens if you run it like so: $ crash --no_elf_notes vmlinux vmcore As far as this message: WARNING: sparsemem: invalid section number: 137438888923 That should be outside the realm of Fujitsu's ELF notes patch. Does this kernel have some kind of Stratus VM modification? Dave ----- Original Message ----- > > Crash faults when determining panic task > > I have a vmcore generated on RHEL6.1 that newer versions of crash > have trouble analyzing (5.1.1-2.el6 seems to work ok) . > > > > I can provide additional binary files if needed, just let me know > what convention best suits the list (ftp, private email attachment, > etc.) > > > > Crash Version : OS: Result: > > crash 5.1.8 Debian wheezy faults > > crash 5.1.7-1.el6 RHEL6.2 Alpha faults > > crash 5.1.1-2.el6 RHEL6.1 ok > > > Kernel: > > 2.6.32-131.0.15.el6.exp10.bz16586.x86_64 ( 2.6.32-131.0.15 + a fix > for Red Hat bz - 707268) > > > Interesting warnings when starting crash: > > WARNING: sparsemem: invalid section number: 137438888923 > > WARNING: sparsemem: invalid section number: 137438888923 > > > First fault, null pointer deference: > > please wait... (determining panic task) > > Program received signal SIGSEGV, Segmentation fault. > > x86_64_get_dumpfile_stack_frame (rsp=0x7fffffffcc58, > rip=0x7fffffffcc50, > > bt_in=0x7fffffffcce0) at x86_64.c:4183 > > 4183 ur_rip = ULONG(user_regs + > > (gdb) p user_regs > > $1 = 0x0 > > > Workaround, check that bt->machdep is not NULL: > > diff -Nupr crash-5.1.8/x86_64.c crash-5.1.8.new/x86_64.c > > --- crash-5.1.8/x86_64.c 2011-09-16 15:01:12.000000000 -0400 > > +++ crash-5.1.8.new/x86_64.c 2011-09-28 14:12:45.347188571 -0400 > > @@ -4178,7 +4178,7 @@ x86_64_get_dumpfile_stack_frame(struct b > > goto skip_stage; > > } > > } > > - } else if (ELF_NOTES_VALID()) { > > + } else if (ELF_NOTES_VALID() && bt->machdep) { > > user_regs = bt->machdep; > > ur_rip = ULONG(user_regs + > > OFFSET(user_regs_struct_rip)); > > > Second fault, a curiously large n_descsz in elf note header: > > please wait... (determining panic task) > > Program received signal SIGSEGV, Segmentation fault. > > get_regs_from_note (note=0xd26472 "\b", ip=0x7fffffffc4e0, > sp=0x7fffffffc4e8) > > at netdump.c:2221 > > 2221 *sp = ULONG(user_regs + offset_sp); > > (gdb) p *(Elf64_Nhdr *)note > > $1 = {n_namesz = 8, n_descsz = 3438804992, n_type = 8} > > > Workaround, do not attempt reading registers from elf notes (this > chunk of code was not present in crash 5.1.1): > > diff -Nupr crash-5.1.8/netdump.c crash-5.1.8.new/netdump.c > > --- crash-5.1.8/netdump.c 2011-09-16 15:01:12.000000000 -0400 > > +++ crash-5.1.8.new/netdump.c 2011-09-28 14:14:43.687183734 -0400 > > @@ -2286,7 +2286,7 @@ get_netdump_regs_x86_64(struct bt_info * > > > > bt->machdep = (void *)user_regs; > > } > > - > > +#if 0 > > if (ELF_NOTES_VALID() && > > (bt->flags & BT_DUMPFILE_SEARCH) && DISKDUMP_DUMPFILE() && > > (note = (Elf64_Nhdr *) > > @@ -2305,7 +2305,7 @@ get_netdump_regs_x86_64(struct bt_info * > > > > bt->machdep = (void *)user_regs; > > } > > - > > +#endif > > machdep->get_stack_frame(bt, ripp, rspp); } > > > Given the warning messages at the beginning of the process, I'm sure > if I' m dealing with a corrupted or incomplete vmcore image. Let me > know what additional info could be useful if this seems worth > debugging further. > > > > Thanks, > > -- Joe Lawrence > -- > Crash-utility mailing list > Crash-utility@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/crash-utility > -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility