Re: [RFC] Crash patch for DWARF CFI based unwind support

Dave Anderson <anderson@xxxxxxxxxx> · Thu, 26 Oct 2006 11:46:17 -0400

Rachita Kothiyal wrote:

> Hi Rachita,

>

> I've only been able to test this on a live system that has __start_unwind

> and __end_unwind symbols, so I don't know what a backtrace with an

> in-kernel exception frame, or a backtrace with a transition to the
x86_64

> IRQ stack or x86_64 exception stacks, would look like.  If you
have

> an example, I'd be interested in seeing how they get handled.
Hi Dave
I was trying to use LKDTM to create various scenarios for crash dump.

To start with is the case of panic() in an interrupt context. Here
I

am inducing a panic in handle_IRQ_event(), where I am registering a

jprobe. jp_handle_irq_event() is the jprobe handler which in turn calls

lkdtm_handler(). Running crash on the dump gives the following:
crash> bt

PID: 3898   TASK: ffff81022e988e20  CPU: 0  
COMMAND: "slapd"

 #0 [ffffffff8064bcf8] crash_kexec at ffffffff80152211

 #1 [ffffffff8064bd40] machine_kexec at ffffffff8011a739

 #2 [ffffffff8064bd80] crash_kexec at ffffffff8015222d

 #3 [ffffffff8064be08] crash_kexec at ffffffff80152211

 #4 [ffffffff8064be30] bust_spinlocks at ffffffff8011fd6d

 #5 [ffffffff8064be40] panic at ffffffff80131410

 #6 [ffffffff8064beb0] cdrom_pc_intr at ffffffff802ebe68

 #7 [ffffffff8064bef0] ide_intr at ffffffff802df26f

 #8 [ffffffff8064bf30] lkdtm_handler at ffffffff8800230d

 #9 [ffffffff8064bf40] jp_handle_irq_event at ffffffff880023e8

#10 [ffffffff8064bf50] __do_IRQ at ffffffff801544f4

#11 [ffffffff8064bf58] __do_softirq at ffffffff80136b8f

#12 [ffffffff8064bf90] do_IRQ at ffffffff8010bda1

--- <IRQ stack> ---

#13 [ffff810229fd5f80] ret_from_intr at ffffffff80109b95

    [exception RIP: unknown or invalid address]

    RIP: 0000000000000000  RSP: 0000000000000000 
RFLAGS: 00000000

    RAX: ffffffffffffffff  RBX: 00002afe35608c98 
RCX: 00002afe359f7be4

    RDX: 0000000000000033  RSI: 0000000000000202 
RDI: 00007fff754bfbe0

    RBP: 000000000000000a   R8: 000055555590bca0  
R9: 0000000000000000

    R10: 00002afe35608c98  R11: 0000000000000001 
R12: 0000000000000000

    R13: 0000000000000000  R14: 00002afe3597b1e0 
R15: 000055555590b760

    ORIG_RAX: 000000000000002b  CS: 0000 
SS: 0000

bt: WARNING: possibly bogus exception frame

    RIP: 00002afe359f7be4  RSP: 00007fff754bfbe0 
RFLAGS: 00000202

    RAX: 00002afe35608c98  RBX: 000055555590b760 
RCX: 0000000000000001

    RDX: 00002afe35608c98  RSI: 0000000000000000 
RDI: 000055555590bca0

    RBP: ffffffff80109c0b   R8: 000000000000000a  
R9: 0000000000000000

    R10: 0000000000000000  R11: 00002afe3597b1e0 
R12: 000055555590b760

    R13: 00007fff754bfd38  R14: 0000000000000001 
R15: 000055555590b760

    ORIG_RAX: ffffffffffffffff  CS: 0033 
SS: 002b

crash> set unwind on

unwind: on

crash> bt

PID: 3898   TASK: ffff81022e988e20  CPU: 0  
COMMAND: "slapd"

 #0 [ffffffff8064bd88] crash_kexec at ffffffff80152211

 #1 [ffffffff8064be48] panic at ffffffff80131410

 #2 [ffffffff8064bf38] lkdtm_handler at ffffffff8800230d

--- <IRQ stack> ---

 #3 [ffff810229fd5f80] ret_from_intr at ffffffff80109b95

    [exception RIP: unknown or invalid address]

    RIP: 0000000000000000  RSP: 0000000000000000 
RFLAGS: 00000000

    RAX: ffffffffffffffff  RBX: 00002afe35608c98 
RCX: 00002afe359f7be4

    RDX: 0000000000000033  RSI: 0000000000000202 
RDI: 00007fff754bfbe0

    RBP: 000000000000000a   R8: 000055555590bca0  
R9: 0000000000000000

    R10: 00002afe35608c98  R11: 0000000000000001 
R12: 0000000000000000

    R13: 0000000000000000  R14: 00002afe3597b1e0 
R15: 000055555590b760

    ORIG_RAX: 000000000000002b  CS: 0000 
SS: 0000

bt: WARNING: possibly bogus exception frame

 #4 [ffff810229fd5f80] common_interrupt at ffffffff80109b95

    RIP: 00002afe359f7be4  RSP: 00007fff754bfbe0 
RFLAGS: 00000202

    RAX: 00002afe35608c98  RBX: 000055555590b760 
RCX: 0000000000000001

    RDX: 00002afe35608c98  RSI: 0000000000000000 
RDI: 000055555590bca0

    RBP: ffffffff80109c0b   R8: 000000000000000a  
R9: 0000000000000000

    R10: 0000000000000000  R11: 00002afe3597b1e0 
R12: 000055555590b760

    R13: 00007fff754bfd38  R14: 0000000000000001 
R15: 000055555590b760

    ORIG_RAX: ffffffffffffffff  CS: 0033 
SS: 002b

crash>
Comments?

Not really.  I don't understand why IRQ exception frame contents

look the way they do.  The crash utility's "irq_eframe" value,

which gets assigned from a fixed location at the top of the

interrupt stack, is obviously pointing to a bogus exception frame.

There might be something different in newer kernels, although I

don't see it in 2.6.18?
If you do a "help -m", the very last line shows the base address

of each per-cpu interrupt stack, as in:
crash> help -m

...

ibase[cpus]:

   ffffffff80453d00 0000010037e28000 0000010037e60000
0000010037e94000

crash>
Each of the interrupt stacks are 16k long, so you can

look at them with a "rd" with a count of 2048.  The linkage

between the interrupt stack and the stack from where it came

is supposed to be at the first word just below the top 64 words

on that stack.
So if I take cpu 2's interrupt stack from above, which is at

10037e60000, and dump it using "rd -s", I see this:
crash> rd -s 10037e60000 2048

...

     10037e63f90:  0000007fbffff4b0 0000007fbffff410

     10037e63fa0:  0000000000000046 0000000000000046

     10037e63fb0:  apic_timer_interrupt+0x85
000001003741df58

     10037e63fc0:  0000000000000000 0000000000000000

     10037e63fd0:  0000000000000000 0000000000000000

     10037e63fe0:  0000000000000000 0000000000000000

     10037e63ff0:  0000000000000000 0000000000000000

crash>
The linkage to the previous stack is at IRQ stack address

10037e63fb8, which points back to a process stack address of

1003741df58.  That process stack address is also a pointer

to the exception frame that was laid down there:
crash> pt_regs 000001003741df58

struct pt_regs {

  r15 = 0x0,

  r14 = 0x1003741df58,

  r13 = 0x7fbfffe840,

  r12 = 0xa8,

  rbp = 0x17dbfc0,

  rbx = 0xffffffff801103ce,

  r11 = 0x246,

  r10 = 0x31d072f928,

  r9 = 0x4587d9,

  r8 = 0x3741df58,

  rax = 0xffffffffffffffda,

  rcx = 0xffffffffffffffff,

  rdx = 0xa8,

  rsi = 0x17dbfc0,

  rdi = 0x4,

  orig_rax = 0x0,

  rip = 0x31d05b85b2,

  cs = 0x33,

  eflags = 0x246,

  rsp = 0x7fbfffe808,

  ss = 0x2b

}
So the last time an interrupt occurred on this

cpu, a program was running in user space (cs 0x33)

when a timer interrupt occurred.
If it had occurred in kernel mode, a kernel mode

exception frame should be there,  So for example,

if I take cpu 3's interrupt stack from above, it

shows this:
crash> rd -s 0000010037e94000 2048

...

     10037e97f90:  0000000000000000 0000000000000000  
................

     10037e97fa0:  0000000000000046 0000000000000046  
F.......F.......

     10037e97fb0:  apic_timer_interrupt+0x85
0000010037e83e98   .........>.7....

     10037e97fc0:  0000000000000000 0000000000000000  
................

     10037e97fd0:  0000000000000000 0000000000000000  
................

     10037e97fe0:  0000000000000000 0000000000000000  
................

     10037e97ff0:  0000000000000000 0000000000000000  
................

crash>
And in this case, the timer interrupt occurred while running

in kernel mode (cs 0x10):
crash> pt_regs 0000010037e83e98

struct pt_regs {

  r15 = 0x0,

  r14 = 0x0,

  r13 = 0x0,

  r12 = 0x0,

  rbp = 0x1,

  rbx = 0x0,

  r11 = 0x5,

  r10 = 0x0,

  r9 = 0x8,

  r8 = 0x10037e82000,

  rax = 0x0,

  rcx = 0x0,

  rdx = 0x0,

  rsi = 0x1003f047030,

  rdi = 0x1000102a7e0,

  orig_rax = 0xffffffffffffffef,

  rip = 0xffffffff8010e84c,

  cs = 0x10,

  eflags = 0x246,

  rsp = 0x10037e83f48,

  ss = 0x18

}

crash>
The rip above shows that the cpu was in idle:
crash> sym 0xffffffff8010e84c

ffffffff8010e84c (t) mwait_idle+0x56  include/asm/thread_info.h:
63

crash>
So I cannot explain what's going on in your kernel,

where it looks like that basic stack-linkage assumption

is being violated, or has changed somehow?  Maybe kdump

or jprobes has done something?  I don't know what else to

tell you.

In the stacktrace with 'unwind on', I was expecting to see jp_handle_irq_event

appear too (as frame 3)..Could my using a module to register the probe
be the

reason ?

I don't know.  Maybe since the jprobes procedure mucks with

the text, it somehow alter the unwind mechanism?
Helpless,

  Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility