[Crash-utility] Re: [PATCH 0/2] vmcoreinfo support for dump filtering #2

anderson@xxxxxxxxxx (Dave Anderson) · Tue, 11 Sep 2007 10:03:43 -0400

Vivek Goyal wrote:
> On Mon, Sep 10, 2007 at 11:35:21AM -0700, Randy Dunlap wrote:
> 
>>On Fri, 7 Sep 2007 17:57:46 +0900 Ken'ichi Ohmichi wrote:
>>
>>
>>>Hi,
>>
>>>I released a new makedumpfile (version 1.2.0) with vmcoreinfo support.
>>>I updated the patches for linux and kexec-tools.
>>>
>>>PATCH SET:
>>>[1/2] [linux-2.6.22] Add vmcoreinfo
>>>  The patch is for linux-2.6.22.
>>>  The patch adds the vmcoreinfo data. Its address and size are output
>>>  to /sys/kernel/vmcoreinfo.
>>>
>>>[2/2] [kexec-tools] Pass vmcoreinfo's address and size
>>>  The patch is for kexec-tools-testing-20070330.
>>>  (http://www.kernel.org/pub/linux/kernel/people/horms/kexec-tools/)
>>>  kexec command gets the address and size of the vmcoreinfo data from
>>>  /sys/kernel/vmcoreinfo, and passes them to the second kernel through
>>>  ELF header of /proc/vmcore. When the second kernel is booting, the
>>>  kernel gets them from the ELF header and creates vmcoreinfo's PT_NOTE
>>>  segment into /proc/vmcore.
>>
>>Hi,
>>When using the vmcoreinfo patches, what tool(s) are available for
>>analyzing the vmcore (dump) file?  E.g., lkcd or crash or just gdb?
>>
>>gdb works for me, but I tried to use crash (4.0-4.6 from
>>http://people.redhat.com/anderson/) and crash complained:
>>
>>crash: invalid kernel virtual address: 0  type: "cpu_pda entry"
>>
>>Should crash work, or does it need to be modified?
>>
> 
> 
> Hi Randy,
> 
> Crash should just work. It might broken on latest kernel. Copying it
> to crash-utility mailing list. Dave will be able to tell us better.
> 
> 
>>This is on a 2.6.23-rc3 kernel with vmcoreinfo patches and a dump file
>>with -l 31 (dump level 31, omitting all possible pages).

There's always the possibility that something crucial (to the crash
utility) has changed in the upstream kernel; that's just the nature
of the beast.

In this case, crash is reading this set of per-cpu pointers:

   struct x8664_pda *_cpu_pda[NR_CPUS] __read_mostly;

and for each one, it then reads the x8664_pda data structure
that it points to -- but finds a NULL.  It's possible that it
has incorrectly determined the number of x8664_pda structures
(cpus) that exist.  Or less likely, the array contents were read
as zeroes from the dumpfile.

Anyway, with any initialization-time failure, it's usually helpful
to invoke crash with the "-d7" (debug level) argument, as in:

  $ crash -d7 vmlinux vmcore

That will display information re: every read made to the dumpfile.
In this case, normally you would see, for each cpu, a read of the
individual 8-byte address from the array, and then based upon what
it read, the subsequent read of the whole 128-byte data structure:

<readmem: ffffffff8042d9c0, KVADDR, "_cpu_pda addr", 8, (FOE), 7fbffff210>
<readmem: ffffffff80406000, KVADDR, "cpu_pda entry", 128, (FOE), 937680>
CPU0: level4_pgt: 200000010 data_offset: ffff8100899c1000
<readmem: ffffffff8042d9c8, KVADDR, "_cpu_pda addr", 8, (FOE), 7fbffff210>
<readmem: ffff81003ff027c0, KVADDR, "cpu_pda entry", 128, (FOE), 937680>
CPU1: level4_pgt: 200000010 data_offset: ffff8100899c9000
<readmem: ffffffff8042d9d0, KVADDR, "_cpu_pda addr", 8, (FOE), 7fbffff210>
<readmem: ffff81003ff19e40, KVADDR, "cpu_pda entry", 128, (FOE), 937680>
CPU2: level4_pgt: 200000010 data_offset: ffff8100899d1000
<readmem: ffffffff8042d9d8, KVADDR, "_cpu_pda addr", 8, (FOE), 7fbffff210>
<readmem: ffff81003ff19640, KVADDR, "cpu_pda entry", 128, (FOE), 937680>
CPU3: level4_pgt: 200000010 data_offset: ffff8100899d9000
<readmem: ffffffff8042d9e0, KVADDR, "_cpu_pda addr", 8, (FOE), 7fbffff210>
<readmem: ffffffff80406200, KVADDR, "cpu_pda entry", 128, (FOE), 937680>

 From that data structure it grabs the level4_pgt and data_offset
fields for subsequent use.  So in your case, it should show how
many (if any) of the x8664_pda structures it read before encountering
a NULL pointer in one of the array entries.

Dave