Re: Debugging Xen Hypervisor with 'crash' question...

Dave Anderson <anderson@xxxxxxxxxx> · Thu, 11 Oct 2007 17:24:14 -0400

Roger Cruz wrote:
Dave, thanks for your reply.

I am trying to find out why a change I made to the hypervisor is
crashing. Using print statements is not a workable solution so I'm
trying the coredump approach.  I have crash working on a coredump but
the backtrace information does not match what Xen puts out on its serial
port (I know Xen's it's not 100% correct, but at least it does show some
of the actual routines executed in the calltrace).

I see what you're talking about, but I'm afraid I can't shed
any light on the matter.  I have never personally debugged any
hypervisor crashes.

Support for xen-syms analysis in the crash utility was not done
done by me, but implemented wholly by Itsuro Oda (oda@xxxxxxxxxxxxx)
and Fumihiko Kakuma (kakuma@xxxxxxxxxxxxx).  I am also curious as
to how the xen trace shown in the serial port output is related
to the backtrace shown by "bt".

Perhaps they can help explain it?  (Oda-san is on this list)

Additionally, I want to select the context that allows me to walk up and
down the stack and look at the source code and its local variables.  I
can do this with a typical GDB coredump file so I was looking for an
equivalent functionality with 'crash'.  Does crash support that?

No.  The closest to that is "bt -f", which dumps the stack contents
of each frame, but it will require disassembling the affected
functions to figure out where in that stack frame the local variable
exists.

I have tried to run gdb against the /proc/vmcore but it complains about
an unrecognized format, so it looks like my only choice right now is
'crash', unless someone has figured out how to make GDB work with Xen
cores.

Thank you again!
Roger

This GDB was configured as "i686-pc-linux-gnu"...

   KERNEL: xen-syms
 DUMPFILE: /dom0/proc/vmcore
     CPUS: 4
  DOMAINS: 6
   UPTIME: 01:51:01
  MACHINE: Intel(R) Xeon(R) CPU            5140  @ 2.33GHz  (2327 Mhz)
   MEMORY: 4 GB
  PCPU-ID: 2
     PCPU: ff1bbfb4
  VCPU-ID: 0
     VCPU: ffbdf080  (VCPU_RUNNING)
DOMAIN-ID: 1
   DOMAIN: ff1a8080  (DOMAIN_RUNNING)
    STATE: CRASH

crash> bt
PCPU:  2  VCPU: ffbdf080
 #0 [ff1bbcd4] kexec_crash_save_cpu at ff10a8ad
 #1 [ff1bbcdc] kexec_crash at ff10a9f0
 #2 [ff1bbcec] panic at ff11b9c2
 #3 [ff1bbd1c] do_invalid_op at ff13314b
 #4 [ff1bbd7c] handle_exception at ff1734bc

==== output from Xen's serial port during the crash ===

(XEN) Xen BUG at page_alloc.c:902
(XEN) ----[ Xen-3.1.0  x86_32p  debug=n  Not tainted ]----
(XEN) CPU:    2
(XEN) EIP:    e008:[<ff10e4c5>] free_domheap_pages+0x105/0x2d0
(XEN) EFLAGS: 00010206   CONTEXT: hypervisor
(XEN) eax: 00000000   ebx: f839ea18   ecx: e8000081   edx: 00000000
(XEN) esi: 00000000   edi: ffbd4080   ebp: 00000001   esp: ff1bbdb0
(XEN) cr0: 8005003b   cr4: 000026f0   cr3: 00bdfca0   cr2: d9ee6f88
(XEN) ds: e010   es: e010   fs: e010   gs: e010   ss: e010   cs: e008
(XEN) Xen stack trace from esp=ff1bbdb0:
(XEN)    ff17dd3c e8000081 00000000 00000003 2dad0067 00000000 f839ea18
f839ea18
(XEN)    ffbd0900 00000000 ff19b0a0 ff108df1 f839ea18 00000000 001269c1
00000022
(XEN)    ffbdfd30 ff14382e ffbdf080 ffbdfd30 ff23e900 ff23e900 0000000f
ff23e900
(XEN)    ff1a8080 ff115457 ffbe6080 ffc01000 ffbdf080 ff1a8080 ff1a9708
ff23e900
(XEN)    ff1af104 000f060e 00000000 00000000 ffbd4080 001269c1 00009708
ff1a8080
(XEN)    00000614 ffbc9000 00220001 ff159d31 00000001 00000001 00000000
0000060e
(XEN)    00000000 ff1a96d4 ffbdfdfc ffbdfdf8 ff1bf080 ff14b8d0 ff1a9708
f9efbefd
(XEN)    0000060e 07b86000 00000000 269c1000 00000001 00000014 fe867ce8
ff1bbfb4
(XEN)    00000014 ff1bbfb4 ff1ad1f0 ff13f088 00000001 f95dfba0 00000001
47868c00
(XEN)    00000000 0000060e f2e792f6 0000060e 00fee003 ffffffff ff107c90
00000012
(XEN)    00000003 00000000 ffbdf080 ff15d2d4 ff1bbfb4 fffe00b0 ff1bbfb4
0000060e
(XEN)    00000004 00000000 ffbdf080 f9c8f8bf ff144f0d ffbdf080 00000000
ffbdf080
(XEN)    0000060e ff1a96d4 ffbdf080 ffffffff ff14b7df ffbdf080 ffffffff
ffbdfd74
(XEN)    ff14cbb7 ffbdfd74 ffbdf080 ffbdf080 ff144ffa ffbdf080 ffbdf080
f95df858
(XEN)    ff1575ef ffbdf080 ff1a9708 ff23e900 ff1153ae f9efbefd 0000060e
ff1af100
(XEN)    00000002 ff1bbfb4 ff1bbfb4 00000001 47868c00 00000000 f95dfb54
ff15e717
(XEN)    ff1bbfb4 00000001 f95dfba0 00000001 47868c00 00000000 f95dfb54
00000014
(XEN)    00f0000b ff249285 00000008 00000282 f95dfb44 00000010 00000000
00000000
(XEN)    00000000 00000000 00000002 ffbdf080
(XEN) Xen call trace:
(XEN)    [<ff10e4c5>] free_domheap_pages+0x105/0x2d0
(XEN)    [<ff19b0a0>] dmi_decode+0x20/0xd0
(XEN)    [<ff108df1>] do_grant_table_op+0x1161/0x1910
(XEN)    [<ff14382e>] hvm_io_assist+0x13e/0x1400
(XEN)    [<ff1a8080>] zap_low_mappings+0x80/0xa0
(XEN)    [<ff115457>] add_entry+0x57/0x140
(XEN)    [<ff1a8080>] zap_low_mappings+0x80/0xa0
(XEN)    [<ff1a8080>] zap_low_mappings+0x80/0xa0
(XEN)    [<ff159d31>] vmx_io_instruction+0x421/0xc30
(XEN)    [<ff14b8d0>] pt_thaw_time+0x70/0x80
(XEN)    [<ff13f088>] hvm_do_hypercall+0xb8/0x1e0
(XEN)    [<ff107c90>] do_grant_table_op+0x0/0x1910
(XEN)    [<ff15d2d4>] vmx_vmexit_handler+0x334/0x1760
(XEN)    [<ff144f0d>] is_isa_irq_masked+0x2d/0x90
(XEN)    [<ff14b7df>] pt_update_irq+0x9f/0x120
(XEN)    [<ff14cbb7>] vlapic_has_interrupt+0x37/0x60
(XEN)    [<ff144ffa>] cpu_has_pending_irq+0x3a/0x60
(XEN)    [<ff1575ef>] vmx_intr_assist+0x3f/0x2e0
(XEN)    [<ff1153ae>] timer_softirq_action+0xce/0xf0
(XEN)    [<ff15e717>] vmx_asm_vmexit_handler+0x17/0x20
(XEN)    
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 2:
(XEN) Xen BUG at page_alloc.c:902
(XEN) ****************************************
(XEN)

-----Original Message-----
From: crash-utility-bounces@xxxxxxxxxx
[mailto:crash-utility-bounces@xxxxxxxxxx] On Behalf Of Dave Anderson
Sent: Thursday, October 11, 2007 2:51 PM
To: Discussion list for crash utility usage, maintenance and development
Subject: Re:  Debugging Xen Hypervisor with 'crash'
question...

Roger Cruz wrote:

Sorry if this is an obvious question but I'm new to the 'crash'

utility. 

I read Anderson's white paper on crash and didn't find any references

to how to use 'crash' to debug the hypervisor.  I have crash running

and 

accessing Domain 0's kernel tasks and other variables, so I am 
comfortable thinking that I have the right setup.  I start crash with:

#crash xen-syms /dom0/proc/vmcore

And get the following output

#crash xen-syms /dom0/proc/vmcore

crash 4.0-4.7

Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007  Red Hat, Inc.

Copyright (C) 2004, 2005, 2006  IBM Corporation

Copyright (C) 1999-2006  Hewlett-Packard Co

Copyright (C) 2005, 2006  Fujitsu Limited

Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.

Copyright (C) 2005  NEC Corporation

Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.

Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.

This program is free software, covered by the GNU General Public

License,

and you are welcome to change it and/or distribute copies of it under

certain conditions.  Enter "help copying" to see the conditions.

This program has absolutely no warranty.  Enter "help warranty" for

details.

GNU gdb 6.1

GDB is free software, covered by the GNU General Public License, and

you are

welcome to change it and/or distribute copies of it under certain

conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB.  Type "show warranty" for

details.

This GDB was configured as "i686-pc-linux-gnu"... 

  KERNEL: xen-syms

DUMPFILE: /dom0/proc/vmcore

    CPUS: 4

 DOMAINS: 4

  UPTIME: 00:01:30

 MACHINE: Intel(R) Xeon(R) CPU            5140  @ 2.33GHz  (2327 Mhz)

  MEMORY: 4 GB

 PCPU-ID: 2

    PCPU: ff1bbfb4

 VCPU-ID: 0

    VCPU: ffbe6080  (VCPU_RUNNING)

DOMAIN-ID: 0

  DOMAIN: ff238080  (DOMAIN_RUNNING)

   STATE: CRASH

I would like to know what commands there are to examine the memory 
management system or any other internal data structures.  Also, how do

I 

look at a stack trace in the hypervisor for a crash.  I tried the 'gdb

where' command and it said no stack.

Enter "help" -- it shows the commands when running against
a xen-syms hypervisor:

   crash> help

   *              dumpinfo       list           sched          vcpu
   alias          eval           log            search         vcpus
   ascii          exit           p              set            whatis
   bt             extend         pcpus          struct         wr
   dis            foreach        pte            sym            q
   domain         gdb            rd             sys
   doms           help           repeat         union

   crash version: 4.0-4.7   gdb version: 6.1
   For help on any command above, enter "help <command>".
   For help on input options, enter "help input".
   For help on output options, enter "help output".

   crash>

Then for any particular command, enter "help <command>",
so for backtrace options, enter "help bt".  I do note
that some of the common commands between running crash
on a vmlinux and a xen-syms show the help data for the
command as if it were running against a vmlinux, and
as such, some advertised options may not work on a
xen-syms session.

A limited set of gdb commands are runnable, although the
embedded gdb module has no clue of the vmcore file; it's
invoked internally as "gdb xen-syms".

I'm presuming that the crash occurred within the hypervisor
as opposed to the (vmlinux) kernel?  If it happened within
kernel code, substitute the xen-syms argument with the
vmlinux of the dom0 kernel, and you will be presented
with a different set of commands.

Dave

Thanks in advance.

Roger Cruz

Principal SW Engineer

Marathon Technologies Corp.

978-489-1153

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility