Re:[RFC] Crash patch for DWARF CFI based unwind support

Dave Anderson <anderson@xxxxxxxxxx> · Fri, 27 Oct 2006 12:16:32 -0400

Hi Rachita,
I'm looking at an alt-sysrq-c generated crash on an x86_64 kernel,

and I am seeing something similar to what you are with respect

to the transition from the interrupt stack back to the process

stack.
Here's a "bt -a", where cpu 0 shows that it received a shutdown

NMI while in the idle loop, and the transition from the NMI

exception stack back to the process stack was clean.  But
on

the cpu which took the alt-sysrq-c keyboard interrupt, the

transition from the per-cpu interrupt stack back to the

process stack is similar to what you're seeing:
crash> bt -a

PID: 0      TASK: ffffffff8034ce60 
CPU: 0   COMMAND: "swapper"

 #0 [ffffffff80481f30] crash_nmi_callback at ffffffff8007742f

 #1 [ffffffff80481f40] do_nmi at ffffffff80063c2c

 #2 [ffffffff80481f50] nmi at ffffffff8006312f

    [exception RIP: mwait_idle+54]

    RIP: ffffffff800553b7  RSP: ffffffff80437f90 
RFLAGS: 00000246

    RAX: 0000000000000000  RBX: ffffffff80055381 
RCX: 0000000000000000

    RDX: 0000000000000000  RSI: 0000000000000001 
RDI: ffffffff8034e838

    RBP: 0000000000099000   R8: ffffffff80436000  
R9: 000000000000003e

    R10: ffff810037d0c038  R11: 0000000000000048 
R12: 0000000000090000

    R13: 0000000000000000  R14: 0000000000000000 
R15: 0000000000000000

    ORIG_RAX: ffffffffffffffff  CS: 0010 
SS: 0018

--- <exception stack> ---

 #3 [ffffffff80437f90] mwait_idle at ffffffff800553b7

 #4 [ffffffff80437f90] cpu_idle at ffffffff800473be
PID: 0      TASK: ffff81003fe48100 
CPU: 1   COMMAND: "swapper"

 #0 [ffff81003fe6bb40] crash_kexec at ffffffff800ab7c4

 #1 [ffff81003fe6bbc8] mwait_idle at ffffffff800553b7

 #2 [ffff81003fe6bc00] sysrq_handle_crashdump at ffffffff8019301f

 #3 [ffff81003fe6bc10] __handle_sysrq at ffffffff80192e1c

 #4 [ffff81003fe6bc50] kbd_event at ffffffff8018dbc1

 #5 [ffff81003fe6bca0] input_event at ffffffff801e9b9f

 #6 [ffff81003fe6bcd0] hidinput_hid_event at ffffffff801e42cb

 #7 [ffff81003fe6bd00] hid_process_event at ffffffff801df66b

 #8 [ffff81003fe6bd40] hid_input_report at ffffffff801df9d9

 #9 [ffff81003fe6bdc0] hid_irq_in at ffffffff801e0dc0

#10 [ffff81003fe6bde0] usb_hcd_giveback_urb at ffffffff801d33d8

#11 [ffff81003fe6be10] uhci_giveback_urb at ffffffff88168724

#12 [ffff81003fe6be50] uhci_scan_schedule at ffffffff88168e07

#13 [ffff81003fe6bed0] uhci_irq at ffffffff8816ac08

#14 [ffff81003fe6bf10] usb_hcd_irq at ffffffff801d3dc7

#15 [ffff81003fe6bf20] handle_IRQ_event at ffffffff80010704

#16 [ffff81003fe6bf50] __do_IRQ at ffffffff800b5238

#17 [ffff81003fe6bf58] __do_softirq at ffffffff80011c0b

#18 [ffff81003fe6bf90] do_IRQ at ffffffff8006a762

--- <IRQ stack> ---

#19 [ffff81003fe65e70] ret_from_intr at ffffffff8005bac9

    [exception RIP: cpu_idle+149]

    RIP: ffffffff800473be  RSP: ffffffff8042e220 
RFLAGS: ffffffff80074188

    RAX: ffffffffffffff16  RBX: 0000000000000000 
RCX: ffffffff800553b7

    RDX: 0000000000000010  RSI: 0000000000000246 
RDI: ffff81003fe65ef0

    RBP: ffff81003fe64000   R8: ffffffff8034e838  
R9: 0000000000000001

    R10: 0000000000000000  R11: 0000000000000000 
R12: 000000000000003f

    R13: ffff810037d0c008  R14: 0000000000000246 
R15: 0000000000000001

    ORIG_RAX: 0000000000000018  CS: 0020 
SS: 0000

bt: WARNING: possibly bogus exception frame

crash>
crash dutifully reports that the exception frame looks bogus

because of the CS value.
The end of the per-cpu interrupt stack looks like this:
crash> rd -s ffff81003fe68000 2048

...

ffff81003fe6bf30:  000000000000e900 00000000000000e9

ffff81003fe6bf40:  ffff810037ca4bc0 irq_desc+59708

ffff81003fe6bf50:  __do_IRQ+164     __do_softirq+94

ffff81003fe6bf60:  00000000000000e9 ffff81003fe65e48

ffff81003fe6bf70:  00000000000000ff cpu_data+256

ffff81003fe6bf80:  0000000000000100 cpu_core_map+32

ffff81003fe6bf90:  do_IRQ+231      
ffff81003fe65e70

ffff81003fe6bfa0:  mwait_idle      
ffff81003fe65e70

ffff81003fe6bfb0:  ret_from_intr    ffff81003fe65e70

ffff81003fe6bfc0:  0000000000000000 0000000000000000

ffff81003fe6bfd0:  0000000000000000 0000000000000000

ffff81003fe6bfe0:  0000000000000000 0000000000000000

ffff81003fe6bff0:  0000000000000000 0000000000000000

crash>
...hence the supposed pointer to the generating exception frame

is presumed to be ffff81003fe65e70 (which is bogus).
Interestingly, though, is if I do an exception frame search

for that task, I do find the "real" exception frame:
crash> bt -e

PID: 0      TASK: ffff81003fe48100 
CPU: 1   COMMAND: "swapper"
  KERNEL-MODE EXCEPTION FRAME AT: ffff81003fe65e48

    RIP: ffffffff800553b7  RSP: ffff81003fe65ef0 
RFLAGS: 00000246

    RAX: 0000000000000000  RBX: 0000000000000001 
RCX: 0000000000000000

    RDX: 0000000000000000  RSI: 0000000000000001 
RDI: ffffffff8034e838

    RBP: 0000000000000000   R8: ffff81003fe64000  
R9: 000000000000003f

    R10: ffff810037d0c008  R11: 0000000000000246 
R12: ffff810037fef040

    R13: 0000000000000001  R14: ffff81003fe482f0 
R15: 0000000002245f5e

    ORIG_RAX: ffffffffffffff16  CS: 0010 
SS: 0018

crash> sym ffffffff800553b7

ffffffff800553b7 (t) mwait_idle+0x36  include/asm/thread_info.h:
63

crash>
and if I just grep for that address within the per-cpu interrupt

stack, I see several refernces to it:
crash> rd -s ffff81003fe68000 2048 | grep ffff81003fe65e48

ffff81003fe6bb30:  ffff81003ac7c000 ffff81003fe65e48

ffff81003fe6bc60:  00ffffff8001c1fd ffff81003fe65e48

ffff81003fe6bce0:  ffff81003ec68000 ffff81003fe65e48

ffff81003fe6bd50:  ffff81003fe65e48 0000000180087720

ffff81003fe6bda0:  ffff810037cb7400 ffff81003fe65e48

ffff81003fe6bdb0:  ffff810037cb7550 ffff81003fe65e48

ffff81003fe6be60:  ffff81003cf37488 ffff81003fe65e48

ffff81003fe6bec0:  ffff81003fe65e48 ffff81003fe65e48

ffff81003fe6bed0:  uhci_irq+0x13f   ffff81003fe65e48

ffff81003fe6bf00:  ffff81003fe65e48 ffff81003fe65e48

ffff81003fe6bf60:  00000000000000e9 ffff81003fe65e48

crash>
But none of them are located at the "fixed" location of one

word below the 64-byte block at the top of the interrupt

stack.
So I don't know what's going on in this case...
I don't ever recall seeing such a bogus interrupt-to-process

stack transition on any netdump or diskdump generated vmcores.

And all the "test" kdump kernel dumpfiles I've used have only

been generated by using "echo c > /proc/sysrq-trigger", so the

crash path never went off the process stack.
So without blatantly pointing the finger, I wonder whether there's

something that the kexec/kdump code path does that could possibly

be tinkering with the contents of the interrupt stack?
I also want to try to force another crash but with all cpus

forcibly running something other than the idle task, in case

there's something strange about the interrupt bringing the

cpu out of that "mwait" instruction?  Grasping at straws a

bit here...
Also, that's why I'm always asking for back-trace tests that

do something "real" -- instead of just having the kernel

call panic(), or that do a sys_write() to /proc/sysrq-trigger

to force an oops on the process stack.  At least an alt-sysrq-c

on the console keyboard generates an interrupt, as does your

forced jprobes deal...
BTW, I also note the the reading of the module unwind tables

is reading invalid data, because it's being done before the

non-unity-mapped address translation can even work!  In other

words, vmalloc addresses can only be read after vm_init() is

complete.  So I've added another machdep_init() argument

(POST_VM) that is called just after vm_init(), and in the

case of x86_64 (and x86), can call init_unwind_table().
Thanks,

  Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility