Re: [PATCH 00/11] sadump: Incremental update patches

HATAYAMA Daisuke <d.hatayama@xxxxxxxxxxxxxx> · Fri, 21 Oct 2011 12:08:38 +0900 ( )

From: Dave Anderson <anderson@xxxxxxxxxx>
Subject: Re:  [PATCH 00/11] sadump: Incremental update patches
Date: Thu, 20 Oct 2011 17:06:54 -0400 (EDT)

> 
> 
> ----- Original Message -----
>> Hello Dave,
>> 
>> The following series fix minor bugs, clean up in sadump module, and
>> address the issue on kdump's first 640kB backup.
>> 
>> The last patch is a preparation for makedumpfile's support on
>> sadump-related formats, still work in progress, producing dumpfile in
>> kdump-compressed format from sadump-related formats.
>> 
>> This patch set is based on crash 5.1.9.
> 
> Hello Daisuke,
> 
> As I have stated in our previous sadump-related discussions, you have
> free rein to make whatever changes you like in sadump-specific
> files, or in functions that deal with sadump-specific issues.  However, 
> if your changes modify behavior when used with non-sadump dumpfiles
> then I may have a problem with them.  So when you post a patch-set 
> such as this last set, I would prefer that you post two separate 
> patch-sets.
> 
> This 1/11 patchset is a good example of what I mean.  I have no
> problem with the sadump-specific patches.  But I do have a big
> problem with the last one, which is not necessarily sadump-specific:
> 
>   use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch
> 

I see. I'll send them separately for the future.

> BTW, these are the names of the patches as they were attached, where
> the second one doesn't have "0002-" prepended to it, and there is
> no "0008-" patch?:
>   
>   0001-sadump-bug-close-receives-unintened-value.patch.patch
>   cleanup_is_sadump.patch.patch
>   0002-sadump-bug-specify-wrong-type.patch.patch
>   0003-sadump-bugfix-time-stamp-values-displayed-are-same.patch.patch
>   0004-sadump-don-t-exit-if-time-stamps-mismatch.patch.patch
>   0005-sadump-debug-messages-at-the-beginning-of-open_disk-.patch.patch
>   0006-sadump-Allow-arbitrary-number-of-disk-set-configurat.patch.patch
>   0007-sadump-refer-to-eip-and-esp-on-x86-kernels.patch.patch
>   0010-Make-data-relevant-to-physical-memory-have-64-bits-l.patch.patch
>   0011-Read-kexec-backup-region-if-read-to-the-first-640kB-.patch.patch
>   use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch
> 

Sorry, it's unkind to you. I used stgit to organize the patch set and
send them. I didn't notice that stgit preserves original file names
during attachment.

> Anyway, I tested this by running "bt -a" on a large set of sample dumpfiles, 
> first without, and then with, your patchset.  When your patches are applied, I see 
> numerous examples where the backtraces are missing huge pieces of information.
> 
> Here are typical examples:
> 
> Here with un-patched crash-5.1.9, is a RHEL6 crashing process:
>  
>  PID: 14187  TASK: ffff88012b98e040  CPU: 0   COMMAND: "runtest.sh"
>   #0 [ffff88012b2739e0] machine_kexec at ffffffff810310fb
>   #1 [ffff88012b273a40] crash_kexec at ffffffff810b6632
>   #2 [ffff88012b273b10] oops_end at ffffffff814df320
>   #3 [ffff88012b273b40] no_context at ffffffff81040cbb
>   #4 [ffff88012b273b90] __bad_area_nosemaphore at ffffffff81040f45
>   #5 [ffff88012b273be0] bad_area at ffffffff8104106e
>   #6 [ffff88012b273c10] __do_page_fault at ffffffff81041793
>   #7 [ffff88012b273d30] do_page_fault at ffffffff814e132e
>   #8 [ffff88012b273d60] page_fault at ffffffff814de6b5
>      [exception RIP: sysrq_handle_crash+22]
>      RIP: ffffffff8131b566  RSP: ffff88012b273e18  RFLAGS: 00010096
>      RAX: 0000000000000010  RBX: 0000000000000063  RCX: 0000000000000f95
>      RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000063
>      RBP: ffff88012b273e18   R8: ffffffff81b9e5c0   R9: 0000000000000000
>      R10: 00007fff7b178160  R11: 0000000000000000  R12: 0000000000000000
>      R13: ffffffff81a9a1a0  R14: 0000000000000286  R15: 0000000000000007
>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>   #9 [ffff88012b273e20] __handle_sysrq at ffffffff8131b822
>  #10 [ffff88012b273e70] write_sysrq_trigger at ffffffff8131b8de
>  #11 [ffff88012b273ea0] proc_reg_write at ffffffff811d5bce
>  #12 [ffff88012b273ef0] vfs_write at ffffffff811730c8
>  #13 [ffff88012b273f30] sys_write at ffffffff81173ad1
>  #14 [ffff88012b273f80] system_call_fastpath at ffffffff8100b0b2
>  
> With crash-5.1.9 plus your patch -- nothing is shown below the page fault
> exception frame:
>  
>  PID: 14187  TASK: ffff88012b98e040  CPU: 0   COMMAND: "runtest.sh"
>      [exception RIP: sysrq_handle_crash+22]
>      RIP: ffffffff8131b566  RSP: ffff88012b273e18  RFLAGS: 00010096
>      RAX: 0000000000000010  RBX: 0000000000000063  RCX: 0000000000000f95
>      RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000063
>      RBP: ffff88012b273e18   R8: ffffffff81b9e5c0   R9: 0000000000000000
>      R10: 00007fff7b178160  R11: 0000000000000000  R12: 0000000000000000
>      R13: ffffffff81a9a1a0  R14: 0000000000000286  R15: 0000000000000007
>      CS: 0010  SS: 0018
>   #0 [ffff88012b273e20] __handle_sysrq at ffffffff8131b822
>   #1 [ffff88012b273e70] write_sysrq_trigger at ffffffff8131b8de
>   #2 [ffff88012b273ea0] proc_reg_write at ffffffff811d5bce
>   #3 [ffff88012b273ef0] vfs_write at ffffffff811730c8
>   #4 [ffff88012b273f30] sys_write at ffffffff81173ad1
>   #5 [ffff88012b273f80] system_call_fastpath at ffffffff8100b0b2
>      RIP: 00007fad3a2f45e0  RSP: 00007fff7b1783d8  RFLAGS: 00010206
>      RAX: 0000000000000001  RBX: ffffffff8100b0b2  RCX: 0000000000000000
>      RDX: 0000000000000002  RSI: 00007fad3abe6000  RDI: 0000000000000001
>      RBP: 00007fad3abe6000   R8: 000000000000000a   R9: 00007fad3abe2700
>      R10: 00007fff7b178160  R11: 0000000000000246  R12: 0000000000000002
>      R13: 00007fad3a5a6780  R14: 0000000000000002  R15: 0000000000000001
>      ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
>   
> Again with un-patched crash-5.1.9, here are examples of two non-crashing cpus
> that received shutdown NMI interrupts from the crashing task:
>  
>  PID: 0      TASK: ffff88012cd2f580  CPU: 1   COMMAND: "swapper"
>   #0 [ffff880028227e90] crash_nmi_callback at ffffffff81028a96
>   #1 [ffff880028227ea0] notifier_call_chain at ffffffff814e13e5
>   #2 [ffff880028227ee0] atomic_notifier_call_chain at ffffffff814e144a
>   #3 [ffff880028227ef0] notify_die at ffffffff810942fe
>   #4 [ffff880028227f20] do_nmi at ffffffff814df033
>   #5 [ffff880028227f50] nmi at ffffffff814de940
>      [exception RIP: intel_idle+177]
>      RIP: ffffffff812bc291  RSP: ffff88012cd31e68  RFLAGS: 00000046
>      RAX: 0000000000000020  RBX: 0000000000000008  RCX: 0000000000000001
>      RDX: 0000000000000000  RSI: ffff88012cd31fd8  RDI: ffffffff81a34040
>      RBP: ffff88012cd31ed8   R8: 0000000000000000   R9: 00000000000000c8
>      R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000020
>      R13: 12257c81ed7a34e6  R14: 0000000000000003  R15: 0000000000000001
>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  --- <NMI exception stack> ---
>   #6 [ffff88012cd31e68] intel_idle at ffffffff812bc291
>   #7 [ffff88012cd31ee0] cpuidle_idle_call at ffffffff813ed4b7
>   #8 [ffff88012cd31f00] cpu_idle at ffffffff81009de6
> 
>  PID: 37     TASK: ffff88012ce360c0  CPU: 2   COMMAND: "events/2"
>   #0 [ffff880028247e90] crash_nmi_callback at ffffffff81028a96
>   #1 [ffff880028247ea0] notifier_call_chain at ffffffff814e13e5
>   #2 [ffff880028247ee0] atomic_notifier_call_chain at ffffffff814e144a
>   #3 [ffff880028247ef0] notify_die at ffffffff810942fe
>   #4 [ffff880028247f20] do_nmi at ffffffff814df033
>   #5 [ffff880028247f50] nmi at ffffffff814de940
>      [exception RIP: io_serial_in+22]
>      RIP: ffffffff813324f6  RSP: ffff88012ce5fc70  RFLAGS: 00000006
>      RAX: ffffffffab364400  RBX: ffffffff81f2cca0  RCX: 0000000000000000
>      RDX: 000000000000d055  RSI: 0000000000000005  RDI: ffffffff81f2cca0
>      RBP: ffff88012ce5fc70   R8: ffffffff81b9e5c0   R9: 0000000000000000
>      R10: ffff880127498a60  R11: 0000000000000001  R12: 000000000000270c
>      R13: 0000000000000020  R14: 0000000000000000  R15: ffffffff81332ba0
>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  --- <NMI exception stack> ---
>   #6 [ffff88012ce5fc70] io_serial_in at ffffffff813324f6
>   #7 [ffff88012ce5fc78] wait_for_xmitr at ffffffff81332b03
>   #8 [ffff88012ce5fca8] serial8250_console_putchar at ffffffff81332bc6
>   #9 [ffff88012ce5fcc8] uart_console_write at ffffffff8132e55e
>  #10 [ffff88012ce5fd08] serial8250_console_write at ffffffff81332f2d
>  #11 [ffff88012ce5fd58] __call_console_drivers at ffffffff81067495
>  #12 [ffff88012ce5fd88] _call_console_drivers at ffffffff810674fa
>  #13 [ffff88012ce5fda8] release_console_sem at ffffffff81067ac8
>  #14 [ffff88012ce5fde8] fb_flashcursor at ffffffff812abb4a
>  #15 [ffff88012ce5fe38] worker_thread at ffffffff81088a40
>  #16 [ffff88012ce5fee8] kthread at ffffffff8108dff6
>  #17 [ffff88012ce5ff48] kernel_thread at ffffffff8100c10a
>  
> But when running crash-5.1.9 plus your patch -- the transitions to the NMI exception
> stack are not even shown at all:
>     
>  PID: 0      TASK: ffff88012cd2f580  CPU: 1   COMMAND: "swapper"
>      [exception RIP: intel_idle+177]
>      RIP: ffffffff812bc291  RSP: ffff88012cd31e68  RFLAGS: 00000046
>      RAX: 0000000000000020  RBX: 0000000000000008  RCX: 0000000000000001
>      RDX: 0000000000000000  RSI: ffff88012cd31fd8  RDI: ffffffff81a34040
>      RBP: ffff88012cd31ed8   R8: 0000000000000000   R9: 00000000000000c8
>      R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000020
>      R13: 12257c81ed7a34e6  R14: 0000000000000003  R15: 0000000000000001
>      CS: 0010  SS: 0018
>   #0 [ffff88012cd31e70] sched_clock_cpu at ffffffff8109539d
>   #1 [ffff88012cd31ee0] cpuidle_idle_call at ffffffff813ed4b7
>   #2 [ffff88012cd31f00] cpu_idle at ffffffff81009de6
>  
>  PID: 37     TASK: ffff88012ce360c0  CPU: 2   COMMAND: "events/2"
>      [exception RIP: io_serial_in+22]
>      RIP: ffffffff813324f6  RSP: ffff88012ce5fc70  RFLAGS: 00000006
>      RAX: ffffffffab364400  RBX: ffffffff81f2cca0  RCX: 0000000000000000
>      RDX: 000000000000d055  RSI: 0000000000000005  RDI: ffffffff81f2cca0
>      RBP: ffff88012ce5fc70   R8: ffffffff81b9e5c0   R9: 0000000000000000
>      R10: ffff880127498a60  R11: 0000000000000001  R12: 000000000000270c
>      R13: 0000000000000020  R14: 0000000000000000  R15: ffffffff81332ba0
>      CS: 0010  SS: 0018
>   #0 [ffff88012ce5fc78] wait_for_xmitr at ffffffff81332b03
>   #1 [ffff88012ce5fca8] serial8250_console_putchar at ffffffff81332bc6
>   #2 [ffff88012ce5fcc8] uart_console_write at ffffffff8132e55e
>   #3 [ffff88012ce5fd08] serial8250_console_write at ffffffff81332f2d
>   #4 [ffff88012ce5fd58] __call_console_drivers at ffffffff81067495
>   #5 [ffff88012ce5fd88] _call_console_drivers at ffffffff810674fa
>   #6 [ffff88012ce5fda8] release_console_sem at ffffffff81067ac8
>   #7 [ffff88012ce5fde8] fb_flashcursor at ffffffff812abb4a
>   #8 [ffff88012ce5fe38] worker_thread at ffffffff81088a40
>   #9 [ffff88012ce5fee8] kthread at ffffffff8108dff6
>  #10 [ffff88012ce5ff48] kernel_thread at ffffffff8100c10a
>  
> If I remove the "use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch" patch
> the backtraces are correct.  Now, it may be true that the changes you made make
> sense with respect to sadump dumpfiles, where the register set stored in the header
> is a reflection of the last location that each cpu ran (?).  
> 
> But those changes are totally unacceptable for compressed kdump dumpfiles.

I undestand the situtation.

I attach V2 patch. I confirmed this doesn't break the logic explained
above. Could you review this?

Thanks.
HATAYAMA, Daisuke

diff --git a/netdump.c b/netdump.c
index f8da284..4011f36 100644
--- a/netdump.c
+++ b/netdump.c
@@ -2508,6 +2508,7 @@ next_sysrq:
 		    (((sp >= GET_STACKBASE(bt->task)) &&
 		      (sp < GET_STACKTOP(bt->task))) ||
 		    in_alternate_stack(bt->tc->processor, sp))) {
+			bt->flags |= BT_KERNEL_SPACE;
 			*eip = ip;
 			*esp = sp;
 			return;
diff --git a/x86.c b/x86.c
index b69adb2..df91110 100755
--- a/x86.c
+++ b/x86.c
@@ -699,6 +699,8 @@ db_stack_trace_cmd(addr, have_addr, count, modif, task, flags)
 	} else if ((bt->flags & BT_KERNEL_SPACE)) {
 		if (KVMDUMP_DUMPFILE())
 			kvmdump_display_regs(bt->tc->processor, fp);
+		else if (ELF_NOTES_VALID() && DISKDUMP_DUMPFILE())
+			diskdump_display_regs(bt->tc->processor, fp);
 		else if (SADUMP_DUMPFILE())
 			sadump_display_regs(bt->tc->processor, fp);
 	}
diff --git a/x86_64.c b/x86_64.c
index 7a7de3c..1c18999 100755
--- a/x86_64.c
+++ b/x86_64.c
@@ -2880,7 +2880,9 @@ x86_64_low_budget_back_trace_cmd(struct bt_info *bt_in)
 			sadump_display_regs(bt->tc->processor, ofp);
 		return;
 	} else if ((bt->flags & BT_KERNEL_SPACE) &&
-		   (KVMDUMP_DUMPFILE() || SADUMP_DUMPFILE())) {
+		   (KVMDUMP_DUMPFILE() ||
+		    (ELF_NOTES_VALID() && DISKDUMP_DUMPFILE()) ||
+		    SADUMP_DUMPFILE())) {
 		fprintf(ofp, "    [exception RIP: ");
 		if ((sp = value_search(bt->instptr, &offset))) {
 			fprintf(ofp, "%s", sp->name);
@@ -2892,6 +2894,8 @@ x86_64_low_budget_back_trace_cmd(struct bt_info *bt_in)
 		fprintf(ofp, "]\n");
 		if (KVMDUMP_DUMPFILE())
 			kvmdump_display_regs(bt->tc->processor, ofp);
+		else if (ELF_NOTES_VALID() && DISKDUMP_DUMPFILE())
+			diskdump_display_regs(bt->tc->processor, ofp);
 		else if (SADUMP_DUMPFILE())
 			sadump_display_regs(bt->tc->processor, ofp);
         } else if (bt->flags & BT_START) {
@@ -4377,6 +4381,11 @@ skip_stage:
 	if (ur_rip && ur_rsp) {
         	*rip = ur_rip;
 		*rsp = ur_rsp;
+		if (is_kernel_text(ur_rip) &&
+		    (((ur_rsp >= GET_STACKBASE(bt->task)) &&
+		      (ur_rsp < GET_STACKTOP(bt->task))) ||
+		     in_alternate_stack(bt->tc->processor, ur_rsp)))
+			bt_in->flags |= BT_KERNEL_SPACE;
 		if (!is_kernel_text(ur_rip) && in_user_stack(bt->tc->task, ur_rsp))
 			bt_in->flags |= BT_USER_SPACE;
 		return;
@@ -4400,8 +4409,19 @@ skip_stage:
 	 *  Use what was (already) saved in the panic task's 
 	 *  registers found in the ELF header.
 	 */ 
-	if (bt->flags & BT_KDUMP_ELF_REGS)
+	if (bt->flags & BT_KDUMP_ELF_REGS) {
+		user_regs = bt->machdep;
+		ur_rip = ULONG(user_regs + OFFSET(user_regs_struct_rip));
+		ur_rsp = ULONG(user_regs + OFFSET(user_regs_struct_rsp));
+		if (is_kernel_text(ur_rip) &&
+		    (((ur_rsp >= GET_STACKBASE(bt->task)) &&
+		      (ur_rsp < GET_STACKTOP(bt->task))) ||
+		     in_alternate_stack(bt->tc->processor, ur_rsp)))
+			bt_in->flags |= BT_KERNEL_SPACE;
+		if (!is_kernel_text(ur_rip) && in_user_stack(bt->tc->task, ur_rsp))
+			bt_in->flags |= BT_USER_SPACE;
 		return;
+	}
 
 	if (CRASHDEBUG(1)) 
         	error(INFO, 
--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility