----- Original Message ----- > Hello Dave, > > The following series fix minor bugs, clean up in sadump module, and > address the issue on kdump's first 640kB backup. > > The last patch is a preparation for makedumpfile's support on > sadump-related formats, still work in progress, producing dumpfile in > kdump-compressed format from sadump-related formats. > > This patch set is based on crash 5.1.9. Hello Daisuke, As I have stated in our previous sadump-related discussions, you have free rein to make whatever changes you like in sadump-specific files, or in functions that deal with sadump-specific issues. However, if your changes modify behavior when used with non-sadump dumpfiles then I may have a problem with them. So when you post a patch-set such as this last set, I would prefer that you post two separate patch-sets. This 1/11 patchset is a good example of what I mean. I have no problem with the sadump-specific patches. But I do have a big problem with the last one, which is not necessarily sadump-specific: use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch BTW, these are the names of the patches as they were attached, where the second one doesn't have "0002-" prepended to it, and there is no "0008-" patch?: 0001-sadump-bug-close-receives-unintened-value.patch.patch cleanup_is_sadump.patch.patch 0002-sadump-bug-specify-wrong-type.patch.patch 0003-sadump-bugfix-time-stamp-values-displayed-are-same.patch.patch 0004-sadump-don-t-exit-if-time-stamps-mismatch.patch.patch 0005-sadump-debug-messages-at-the-beginning-of-open_disk-.patch.patch 0006-sadump-Allow-arbitrary-number-of-disk-set-configurat.patch.patch 0007-sadump-refer-to-eip-and-esp-on-x86-kernels.patch.patch 0010-Make-data-relevant-to-physical-memory-have-64-bits-l.patch.patch 0011-Read-kexec-backup-region-if-read-to-the-first-640kB-.patch.patch use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch Anyway, I tested this by running "bt -a" on a large set of sample dumpfiles, first without, and then with, your patchset. When your patches are applied, I see numerous examples where the backtraces are missing huge pieces of information. Here are typical examples: Here with un-patched crash-5.1.9, is a RHEL6 crashing process: PID: 14187 TASK: ffff88012b98e040 CPU: 0 COMMAND: "runtest.sh" #0 [ffff88012b2739e0] machine_kexec at ffffffff810310fb #1 [ffff88012b273a40] crash_kexec at ffffffff810b6632 #2 [ffff88012b273b10] oops_end at ffffffff814df320 #3 [ffff88012b273b40] no_context at ffffffff81040cbb #4 [ffff88012b273b90] __bad_area_nosemaphore at ffffffff81040f45 #5 [ffff88012b273be0] bad_area at ffffffff8104106e #6 [ffff88012b273c10] __do_page_fault at ffffffff81041793 #7 [ffff88012b273d30] do_page_fault at ffffffff814e132e #8 [ffff88012b273d60] page_fault at ffffffff814de6b5 [exception RIP: sysrq_handle_crash+22] RIP: ffffffff8131b566 RSP: ffff88012b273e18 RFLAGS: 00010096 RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000000f95 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063 RBP: ffff88012b273e18 R8: ffffffff81b9e5c0 R9: 0000000000000000 R10: 00007fff7b178160 R11: 0000000000000000 R12: 0000000000000000 R13: ffffffff81a9a1a0 R14: 0000000000000286 R15: 0000000000000007 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff88012b273e20] __handle_sysrq at ffffffff8131b822 #10 [ffff88012b273e70] write_sysrq_trigger at ffffffff8131b8de #11 [ffff88012b273ea0] proc_reg_write at ffffffff811d5bce #12 [ffff88012b273ef0] vfs_write at ffffffff811730c8 #13 [ffff88012b273f30] sys_write at ffffffff81173ad1 #14 [ffff88012b273f80] system_call_fastpath at ffffffff8100b0b2 With crash-5.1.9 plus your patch -- nothing is shown below the page fault exception frame: PID: 14187 TASK: ffff88012b98e040 CPU: 0 COMMAND: "runtest.sh" [exception RIP: sysrq_handle_crash+22] RIP: ffffffff8131b566 RSP: ffff88012b273e18 RFLAGS: 00010096 RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000000f95 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063 RBP: ffff88012b273e18 R8: ffffffff81b9e5c0 R9: 0000000000000000 R10: 00007fff7b178160 R11: 0000000000000000 R12: 0000000000000000 R13: ffffffff81a9a1a0 R14: 0000000000000286 R15: 0000000000000007 CS: 0010 SS: 0018 #0 [ffff88012b273e20] __handle_sysrq at ffffffff8131b822 #1 [ffff88012b273e70] write_sysrq_trigger at ffffffff8131b8de #2 [ffff88012b273ea0] proc_reg_write at ffffffff811d5bce #3 [ffff88012b273ef0] vfs_write at ffffffff811730c8 #4 [ffff88012b273f30] sys_write at ffffffff81173ad1 #5 [ffff88012b273f80] system_call_fastpath at ffffffff8100b0b2 RIP: 00007fad3a2f45e0 RSP: 00007fff7b1783d8 RFLAGS: 00010206 RAX: 0000000000000001 RBX: ffffffff8100b0b2 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 00007fad3abe6000 RDI: 0000000000000001 RBP: 00007fad3abe6000 R8: 000000000000000a R9: 00007fad3abe2700 R10: 00007fff7b178160 R11: 0000000000000246 R12: 0000000000000002 R13: 00007fad3a5a6780 R14: 0000000000000002 R15: 0000000000000001 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b Again with un-patched crash-5.1.9, here are examples of two non-crashing cpus that received shutdown NMI interrupts from the crashing task: PID: 0 TASK: ffff88012cd2f580 CPU: 1 COMMAND: "swapper" #0 [ffff880028227e90] crash_nmi_callback at ffffffff81028a96 #1 [ffff880028227ea0] notifier_call_chain at ffffffff814e13e5 #2 [ffff880028227ee0] atomic_notifier_call_chain at ffffffff814e144a #3 [ffff880028227ef0] notify_die at ffffffff810942fe #4 [ffff880028227f20] do_nmi at ffffffff814df033 #5 [ffff880028227f50] nmi at ffffffff814de940 [exception RIP: intel_idle+177] RIP: ffffffff812bc291 RSP: ffff88012cd31e68 RFLAGS: 00000046 RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffff88012cd31fd8 RDI: ffffffff81a34040 RBP: ffff88012cd31ed8 R8: 0000000000000000 R9: 00000000000000c8 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000020 R13: 12257c81ed7a34e6 R14: 0000000000000003 R15: 0000000000000001 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #6 [ffff88012cd31e68] intel_idle at ffffffff812bc291 #7 [ffff88012cd31ee0] cpuidle_idle_call at ffffffff813ed4b7 #8 [ffff88012cd31f00] cpu_idle at ffffffff81009de6 PID: 37 TASK: ffff88012ce360c0 CPU: 2 COMMAND: "events/2" #0 [ffff880028247e90] crash_nmi_callback at ffffffff81028a96 #1 [ffff880028247ea0] notifier_call_chain at ffffffff814e13e5 #2 [ffff880028247ee0] atomic_notifier_call_chain at ffffffff814e144a #3 [ffff880028247ef0] notify_die at ffffffff810942fe #4 [ffff880028247f20] do_nmi at ffffffff814df033 #5 [ffff880028247f50] nmi at ffffffff814de940 [exception RIP: io_serial_in+22] RIP: ffffffff813324f6 RSP: ffff88012ce5fc70 RFLAGS: 00000006 RAX: ffffffffab364400 RBX: ffffffff81f2cca0 RCX: 0000000000000000 RDX: 000000000000d055 RSI: 0000000000000005 RDI: ffffffff81f2cca0 RBP: ffff88012ce5fc70 R8: ffffffff81b9e5c0 R9: 0000000000000000 R10: ffff880127498a60 R11: 0000000000000001 R12: 000000000000270c R13: 0000000000000020 R14: 0000000000000000 R15: ffffffff81332ba0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #6 [ffff88012ce5fc70] io_serial_in at ffffffff813324f6 #7 [ffff88012ce5fc78] wait_for_xmitr at ffffffff81332b03 #8 [ffff88012ce5fca8] serial8250_console_putchar at ffffffff81332bc6 #9 [ffff88012ce5fcc8] uart_console_write at ffffffff8132e55e #10 [ffff88012ce5fd08] serial8250_console_write at ffffffff81332f2d #11 [ffff88012ce5fd58] __call_console_drivers at ffffffff81067495 #12 [ffff88012ce5fd88] _call_console_drivers at ffffffff810674fa #13 [ffff88012ce5fda8] release_console_sem at ffffffff81067ac8 #14 [ffff88012ce5fde8] fb_flashcursor at ffffffff812abb4a #15 [ffff88012ce5fe38] worker_thread at ffffffff81088a40 #16 [ffff88012ce5fee8] kthread at ffffffff8108dff6 #17 [ffff88012ce5ff48] kernel_thread at ffffffff8100c10a But when running crash-5.1.9 plus your patch -- the transitions to the NMI exception stack are not even shown at all: PID: 0 TASK: ffff88012cd2f580 CPU: 1 COMMAND: "swapper" [exception RIP: intel_idle+177] RIP: ffffffff812bc291 RSP: ffff88012cd31e68 RFLAGS: 00000046 RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffff88012cd31fd8 RDI: ffffffff81a34040 RBP: ffff88012cd31ed8 R8: 0000000000000000 R9: 00000000000000c8 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000020 R13: 12257c81ed7a34e6 R14: 0000000000000003 R15: 0000000000000001 CS: 0010 SS: 0018 #0 [ffff88012cd31e70] sched_clock_cpu at ffffffff8109539d #1 [ffff88012cd31ee0] cpuidle_idle_call at ffffffff813ed4b7 #2 [ffff88012cd31f00] cpu_idle at ffffffff81009de6 PID: 37 TASK: ffff88012ce360c0 CPU: 2 COMMAND: "events/2" [exception RIP: io_serial_in+22] RIP: ffffffff813324f6 RSP: ffff88012ce5fc70 RFLAGS: 00000006 RAX: ffffffffab364400 RBX: ffffffff81f2cca0 RCX: 0000000000000000 RDX: 000000000000d055 RSI: 0000000000000005 RDI: ffffffff81f2cca0 RBP: ffff88012ce5fc70 R8: ffffffff81b9e5c0 R9: 0000000000000000 R10: ffff880127498a60 R11: 0000000000000001 R12: 000000000000270c R13: 0000000000000020 R14: 0000000000000000 R15: ffffffff81332ba0 CS: 0010 SS: 0018 #0 [ffff88012ce5fc78] wait_for_xmitr at ffffffff81332b03 #1 [ffff88012ce5fca8] serial8250_console_putchar at ffffffff81332bc6 #2 [ffff88012ce5fcc8] uart_console_write at ffffffff8132e55e #3 [ffff88012ce5fd08] serial8250_console_write at ffffffff81332f2d #4 [ffff88012ce5fd58] __call_console_drivers at ffffffff81067495 #5 [ffff88012ce5fd88] _call_console_drivers at ffffffff810674fa #6 [ffff88012ce5fda8] release_console_sem at ffffffff81067ac8 #7 [ffff88012ce5fde8] fb_flashcursor at ffffffff812abb4a #8 [ffff88012ce5fe38] worker_thread at ffffffff81088a40 #9 [ffff88012ce5fee8] kthread at ffffffff8108dff6 #10 [ffff88012ce5ff48] kernel_thread at ffffffff8100c10a If I remove the "use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch" patch the backtraces are correct. Now, it may be true that the changes you made make sense with respect to sadump dumpfiles, where the register set stored in the header is a reflection of the last location that each cpu ran (?). But those changes are totally unacceptable for compressed kdump dumpfiles. Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility