From: Dave Anderson <anderson@xxxxxxxxxx> Subject: Re: [PATCH 00/11] sadump: Incremental update patches Date: Thu, 20 Oct 2011 17:06:54 -0400 (EDT) > > > ----- Original Message ----- >> Hello Dave, >> >> The following series fix minor bugs, clean up in sadump module, and >> address the issue on kdump's first 640kB backup. >> >> The last patch is a preparation for makedumpfile's support on >> sadump-related formats, still work in progress, producing dumpfile in >> kdump-compressed format from sadump-related formats. >> >> This patch set is based on crash 5.1.9. > > Hello Daisuke, > > As I have stated in our previous sadump-related discussions, you have > free rein to make whatever changes you like in sadump-specific > files, or in functions that deal with sadump-specific issues. However, > if your changes modify behavior when used with non-sadump dumpfiles > then I may have a problem with them. So when you post a patch-set > such as this last set, I would prefer that you post two separate > patch-sets. > > This 1/11 patchset is a good example of what I mean. I have no > problem with the sadump-specific patches. But I do have a big > problem with the last one, which is not necessarily sadump-specific: > > use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch > I see. I'll send them separately for the future. > BTW, these are the names of the patches as they were attached, where > the second one doesn't have "0002-" prepended to it, and there is > no "0008-" patch?: > > 0001-sadump-bug-close-receives-unintened-value.patch.patch > cleanup_is_sadump.patch.patch > 0002-sadump-bug-specify-wrong-type.patch.patch > 0003-sadump-bugfix-time-stamp-values-displayed-are-same.patch.patch > 0004-sadump-don-t-exit-if-time-stamps-mismatch.patch.patch > 0005-sadump-debug-messages-at-the-beginning-of-open_disk-.patch.patch > 0006-sadump-Allow-arbitrary-number-of-disk-set-configurat.patch.patch > 0007-sadump-refer-to-eip-and-esp-on-x86-kernels.patch.patch > 0010-Make-data-relevant-to-physical-memory-have-64-bits-l.patch.patch > 0011-Read-kexec-backup-region-if-read-to-the-first-640kB-.patch.patch > use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch > Sorry, it's unkind to you. I used stgit to organize the patch set and send them. I didn't notice that stgit preserves original file names during attachment. > Anyway, I tested this by running "bt -a" on a large set of sample dumpfiles, > first without, and then with, your patchset. When your patches are applied, I see > numerous examples where the backtraces are missing huge pieces of information. > > Here are typical examples: > > Here with un-patched crash-5.1.9, is a RHEL6 crashing process: > > PID: 14187 TASK: ffff88012b98e040 CPU: 0 COMMAND: "runtest.sh" > #0 [ffff88012b2739e0] machine_kexec at ffffffff810310fb > #1 [ffff88012b273a40] crash_kexec at ffffffff810b6632 > #2 [ffff88012b273b10] oops_end at ffffffff814df320 > #3 [ffff88012b273b40] no_context at ffffffff81040cbb > #4 [ffff88012b273b90] __bad_area_nosemaphore at ffffffff81040f45 > #5 [ffff88012b273be0] bad_area at ffffffff8104106e > #6 [ffff88012b273c10] __do_page_fault at ffffffff81041793 > #7 [ffff88012b273d30] do_page_fault at ffffffff814e132e > #8 [ffff88012b273d60] page_fault at ffffffff814de6b5 > [exception RIP: sysrq_handle_crash+22] > RIP: ffffffff8131b566 RSP: ffff88012b273e18 RFLAGS: 00010096 > RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000000f95 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063 > RBP: ffff88012b273e18 R8: ffffffff81b9e5c0 R9: 0000000000000000 > R10: 00007fff7b178160 R11: 0000000000000000 R12: 0000000000000000 > R13: ffffffff81a9a1a0 R14: 0000000000000286 R15: 0000000000000007 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #9 [ffff88012b273e20] __handle_sysrq at ffffffff8131b822 > #10 [ffff88012b273e70] write_sysrq_trigger at ffffffff8131b8de > #11 [ffff88012b273ea0] proc_reg_write at ffffffff811d5bce > #12 [ffff88012b273ef0] vfs_write at ffffffff811730c8 > #13 [ffff88012b273f30] sys_write at ffffffff81173ad1 > #14 [ffff88012b273f80] system_call_fastpath at ffffffff8100b0b2 > > With crash-5.1.9 plus your patch -- nothing is shown below the page fault > exception frame: > > PID: 14187 TASK: ffff88012b98e040 CPU: 0 COMMAND: "runtest.sh" > [exception RIP: sysrq_handle_crash+22] > RIP: ffffffff8131b566 RSP: ffff88012b273e18 RFLAGS: 00010096 > RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000000f95 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063 > RBP: ffff88012b273e18 R8: ffffffff81b9e5c0 R9: 0000000000000000 > R10: 00007fff7b178160 R11: 0000000000000000 R12: 0000000000000000 > R13: ffffffff81a9a1a0 R14: 0000000000000286 R15: 0000000000000007 > CS: 0010 SS: 0018 > #0 [ffff88012b273e20] __handle_sysrq at ffffffff8131b822 > #1 [ffff88012b273e70] write_sysrq_trigger at ffffffff8131b8de > #2 [ffff88012b273ea0] proc_reg_write at ffffffff811d5bce > #3 [ffff88012b273ef0] vfs_write at ffffffff811730c8 > #4 [ffff88012b273f30] sys_write at ffffffff81173ad1 > #5 [ffff88012b273f80] system_call_fastpath at ffffffff8100b0b2 > RIP: 00007fad3a2f45e0 RSP: 00007fff7b1783d8 RFLAGS: 00010206 > RAX: 0000000000000001 RBX: ffffffff8100b0b2 RCX: 0000000000000000 > RDX: 0000000000000002 RSI: 00007fad3abe6000 RDI: 0000000000000001 > RBP: 00007fad3abe6000 R8: 000000000000000a R9: 00007fad3abe2700 > R10: 00007fff7b178160 R11: 0000000000000246 R12: 0000000000000002 > R13: 00007fad3a5a6780 R14: 0000000000000002 R15: 0000000000000001 > ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b > > Again with un-patched crash-5.1.9, here are examples of two non-crashing cpus > that received shutdown NMI interrupts from the crashing task: > > PID: 0 TASK: ffff88012cd2f580 CPU: 1 COMMAND: "swapper" > #0 [ffff880028227e90] crash_nmi_callback at ffffffff81028a96 > #1 [ffff880028227ea0] notifier_call_chain at ffffffff814e13e5 > #2 [ffff880028227ee0] atomic_notifier_call_chain at ffffffff814e144a > #3 [ffff880028227ef0] notify_die at ffffffff810942fe > #4 [ffff880028227f20] do_nmi at ffffffff814df033 > #5 [ffff880028227f50] nmi at ffffffff814de940 > [exception RIP: intel_idle+177] > RIP: ffffffff812bc291 RSP: ffff88012cd31e68 RFLAGS: 00000046 > RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001 > RDX: 0000000000000000 RSI: ffff88012cd31fd8 RDI: ffffffff81a34040 > RBP: ffff88012cd31ed8 R8: 0000000000000000 R9: 00000000000000c8 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000020 > R13: 12257c81ed7a34e6 R14: 0000000000000003 R15: 0000000000000001 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > --- <NMI exception stack> --- > #6 [ffff88012cd31e68] intel_idle at ffffffff812bc291 > #7 [ffff88012cd31ee0] cpuidle_idle_call at ffffffff813ed4b7 > #8 [ffff88012cd31f00] cpu_idle at ffffffff81009de6 > > PID: 37 TASK: ffff88012ce360c0 CPU: 2 COMMAND: "events/2" > #0 [ffff880028247e90] crash_nmi_callback at ffffffff81028a96 > #1 [ffff880028247ea0] notifier_call_chain at ffffffff814e13e5 > #2 [ffff880028247ee0] atomic_notifier_call_chain at ffffffff814e144a > #3 [ffff880028247ef0] notify_die at ffffffff810942fe > #4 [ffff880028247f20] do_nmi at ffffffff814df033 > #5 [ffff880028247f50] nmi at ffffffff814de940 > [exception RIP: io_serial_in+22] > RIP: ffffffff813324f6 RSP: ffff88012ce5fc70 RFLAGS: 00000006 > RAX: ffffffffab364400 RBX: ffffffff81f2cca0 RCX: 0000000000000000 > RDX: 000000000000d055 RSI: 0000000000000005 RDI: ffffffff81f2cca0 > RBP: ffff88012ce5fc70 R8: ffffffff81b9e5c0 R9: 0000000000000000 > R10: ffff880127498a60 R11: 0000000000000001 R12: 000000000000270c > R13: 0000000000000020 R14: 0000000000000000 R15: ffffffff81332ba0 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > --- <NMI exception stack> --- > #6 [ffff88012ce5fc70] io_serial_in at ffffffff813324f6 > #7 [ffff88012ce5fc78] wait_for_xmitr at ffffffff81332b03 > #8 [ffff88012ce5fca8] serial8250_console_putchar at ffffffff81332bc6 > #9 [ffff88012ce5fcc8] uart_console_write at ffffffff8132e55e > #10 [ffff88012ce5fd08] serial8250_console_write at ffffffff81332f2d > #11 [ffff88012ce5fd58] __call_console_drivers at ffffffff81067495 > #12 [ffff88012ce5fd88] _call_console_drivers at ffffffff810674fa > #13 [ffff88012ce5fda8] release_console_sem at ffffffff81067ac8 > #14 [ffff88012ce5fde8] fb_flashcursor at ffffffff812abb4a > #15 [ffff88012ce5fe38] worker_thread at ffffffff81088a40 > #16 [ffff88012ce5fee8] kthread at ffffffff8108dff6 > #17 [ffff88012ce5ff48] kernel_thread at ffffffff8100c10a > > But when running crash-5.1.9 plus your patch -- the transitions to the NMI exception > stack are not even shown at all: > > PID: 0 TASK: ffff88012cd2f580 CPU: 1 COMMAND: "swapper" > [exception RIP: intel_idle+177] > RIP: ffffffff812bc291 RSP: ffff88012cd31e68 RFLAGS: 00000046 > RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001 > RDX: 0000000000000000 RSI: ffff88012cd31fd8 RDI: ffffffff81a34040 > RBP: ffff88012cd31ed8 R8: 0000000000000000 R9: 00000000000000c8 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000020 > R13: 12257c81ed7a34e6 R14: 0000000000000003 R15: 0000000000000001 > CS: 0010 SS: 0018 > #0 [ffff88012cd31e70] sched_clock_cpu at ffffffff8109539d > #1 [ffff88012cd31ee0] cpuidle_idle_call at ffffffff813ed4b7 > #2 [ffff88012cd31f00] cpu_idle at ffffffff81009de6 > > PID: 37 TASK: ffff88012ce360c0 CPU: 2 COMMAND: "events/2" > [exception RIP: io_serial_in+22] > RIP: ffffffff813324f6 RSP: ffff88012ce5fc70 RFLAGS: 00000006 > RAX: ffffffffab364400 RBX: ffffffff81f2cca0 RCX: 0000000000000000 > RDX: 000000000000d055 RSI: 0000000000000005 RDI: ffffffff81f2cca0 > RBP: ffff88012ce5fc70 R8: ffffffff81b9e5c0 R9: 0000000000000000 > R10: ffff880127498a60 R11: 0000000000000001 R12: 000000000000270c > R13: 0000000000000020 R14: 0000000000000000 R15: ffffffff81332ba0 > CS: 0010 SS: 0018 > #0 [ffff88012ce5fc78] wait_for_xmitr at ffffffff81332b03 > #1 [ffff88012ce5fca8] serial8250_console_putchar at ffffffff81332bc6 > #2 [ffff88012ce5fcc8] uart_console_write at ffffffff8132e55e > #3 [ffff88012ce5fd08] serial8250_console_write at ffffffff81332f2d > #4 [ffff88012ce5fd58] __call_console_drivers at ffffffff81067495 > #5 [ffff88012ce5fd88] _call_console_drivers at ffffffff810674fa > #6 [ffff88012ce5fda8] release_console_sem at ffffffff81067ac8 > #7 [ffff88012ce5fde8] fb_flashcursor at ffffffff812abb4a > #8 [ffff88012ce5fe38] worker_thread at ffffffff81088a40 > #9 [ffff88012ce5fee8] kthread at ffffffff8108dff6 > #10 [ffff88012ce5ff48] kernel_thread at ffffffff8100c10a > > If I remove the "use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch" patch > the backtraces are correct. Now, it may be true that the changes you made make > sense with respect to sadump dumpfiles, where the register set stored in the header > is a reflection of the last location that each cpu ran (?). > > But those changes are totally unacceptable for compressed kdump dumpfiles. I undestand the situtation. I attach V2 patch. I confirmed this doesn't break the logic explained above. Could you review this? Thanks. HATAYAMA, Daisuke
diff --git a/netdump.c b/netdump.c index f8da284..4011f36 100644 --- a/netdump.c +++ b/netdump.c @@ -2508,6 +2508,7 @@ next_sysrq: (((sp >= GET_STACKBASE(bt->task)) && (sp < GET_STACKTOP(bt->task))) || in_alternate_stack(bt->tc->processor, sp))) { + bt->flags |= BT_KERNEL_SPACE; *eip = ip; *esp = sp; return; diff --git a/x86.c b/x86.c index b69adb2..df91110 100755 --- a/x86.c +++ b/x86.c @@ -699,6 +699,8 @@ db_stack_trace_cmd(addr, have_addr, count, modif, task, flags) } else if ((bt->flags & BT_KERNEL_SPACE)) { if (KVMDUMP_DUMPFILE()) kvmdump_display_regs(bt->tc->processor, fp); + else if (ELF_NOTES_VALID() && DISKDUMP_DUMPFILE()) + diskdump_display_regs(bt->tc->processor, fp); else if (SADUMP_DUMPFILE()) sadump_display_regs(bt->tc->processor, fp); } diff --git a/x86_64.c b/x86_64.c index 7a7de3c..1c18999 100755 --- a/x86_64.c +++ b/x86_64.c @@ -2880,7 +2880,9 @@ x86_64_low_budget_back_trace_cmd(struct bt_info *bt_in) sadump_display_regs(bt->tc->processor, ofp); return; } else if ((bt->flags & BT_KERNEL_SPACE) && - (KVMDUMP_DUMPFILE() || SADUMP_DUMPFILE())) { + (KVMDUMP_DUMPFILE() || + (ELF_NOTES_VALID() && DISKDUMP_DUMPFILE()) || + SADUMP_DUMPFILE())) { fprintf(ofp, " [exception RIP: "); if ((sp = value_search(bt->instptr, &offset))) { fprintf(ofp, "%s", sp->name); @@ -2892,6 +2894,8 @@ x86_64_low_budget_back_trace_cmd(struct bt_info *bt_in) fprintf(ofp, "]\n"); if (KVMDUMP_DUMPFILE()) kvmdump_display_regs(bt->tc->processor, ofp); + else if (ELF_NOTES_VALID() && DISKDUMP_DUMPFILE()) + diskdump_display_regs(bt->tc->processor, ofp); else if (SADUMP_DUMPFILE()) sadump_display_regs(bt->tc->processor, ofp); } else if (bt->flags & BT_START) { @@ -4377,6 +4381,11 @@ skip_stage: if (ur_rip && ur_rsp) { *rip = ur_rip; *rsp = ur_rsp; + if (is_kernel_text(ur_rip) && + (((ur_rsp >= GET_STACKBASE(bt->task)) && + (ur_rsp < GET_STACKTOP(bt->task))) || + in_alternate_stack(bt->tc->processor, ur_rsp))) + bt_in->flags |= BT_KERNEL_SPACE; if (!is_kernel_text(ur_rip) && in_user_stack(bt->tc->task, ur_rsp)) bt_in->flags |= BT_USER_SPACE; return; @@ -4400,8 +4409,19 @@ skip_stage: * Use what was (already) saved in the panic task's * registers found in the ELF header. */ - if (bt->flags & BT_KDUMP_ELF_REGS) + if (bt->flags & BT_KDUMP_ELF_REGS) { + user_regs = bt->machdep; + ur_rip = ULONG(user_regs + OFFSET(user_regs_struct_rip)); + ur_rsp = ULONG(user_regs + OFFSET(user_regs_struct_rsp)); + if (is_kernel_text(ur_rip) && + (((ur_rsp >= GET_STACKBASE(bt->task)) && + (ur_rsp < GET_STACKTOP(bt->task))) || + in_alternate_stack(bt->tc->processor, ur_rsp))) + bt_in->flags |= BT_KERNEL_SPACE; + if (!is_kernel_text(ur_rip) && in_user_stack(bt->tc->task, ur_rsp)) + bt_in->flags |= BT_USER_SPACE; return; + } if (CRASHDEBUG(1)) error(INFO,
-- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility