Re: crash seek error, failed to read vmcore file

Dave Anderson <anderson@xxxxxxxxxx> · Wed, 21 Apr 2010 09:58:29 -0400 (EDT)

----- "Pavan Naregundi" <pavan@xxxxxxxxxxxxxxxxxx> wrote:

> On Tue, 2010-04-20 at 09:14 -0400, Dave Anderson wrote:
> > ----- "Pavan Naregundi" <pavan@xxxxxxxxxxxxxxxxxx> wrote:
> > 
> > The cause for seek errors depends upon the type
> > of dumpfile.
> > 
> > You didn't mention which type of dumpfile the vmcore
> > is, so I'll presume that it's either an ELF-format
> > kdump or a compressed kdump created by makedumpfile.
> > 
> > So presuming that it's a compressed kdump, the seek error 
> > most likely comes from here in read_diskdump() in diskdump.c:
> > 
> >         if ((pfn >= dd->header->max_mapnr) || !page_is_ram(pfn))
> >                 return SEEK_ERROR;
> > 
> > where the requested physical address pfn values are larger
> > than the max_mapnr value advertised in the header.
> > 
> > When you do any "crash -d# ...", the dumpfile header will
> > be dumped first.  What does that show?
> > 
> > Dave
> 
> 
> Dave,
> 
> Dumpfile is compressed kdump created by makedumpfile.
> 
> header shows the following values: 
> max_mapnr: 32768
> block_shift: 16
> 
> Yes. Adding some debug printf's shows me that (pfn >=
> dd->header->max_mapnr) fails. 
> 
> For example: in the first seek error,
> crash: seek error: kernel virtual address: c0000000af715480  type:
> "kmem_cache buffer"
> 
> paddr: af715480 => pfn=44913
> 
> crash -d8 log: http://pastebin.com/qrCvyPfR
> 
> Thanks..Pavan

OK, so the compressed dumpfile has exactly 32768 pages of physical
memory, or exactly 2GB.  That being the case, the crash utility
will fail all readmem attempts above that value, and obviously 
there is critical data above the artificial 2GB threshold.  

The question at hand is why kdump is creating a truncated dumpfile
with a max_mapnr of 32768:

(1) makedumpfile determines the "max_mapnr" value based upon the 
    highest physical address found in any of the PT_LOAD segments
    of the /proc/vmcore file on the secondary kernel.
(2) the /proc/vmcore PT_LOAD segments were pre-calculated during
    the primary kernel's kdump initialization phase, based upon
    the values found in the set of "/proc/device-tree/memory@xxx/reg"
    files existing in the primary kernel, where the "xxx" is the
    starting physical address of the memory region, and the "reg"
    file in that directory contains the size of the memory region. 

For whatever reason, those files showed a maximum of 2GB of
physical memory.  (If you do not use makedumpfile, and then do
a "readelf -a" of the resultant vmcore file, you will see 
the PT_LOAD segment values.)

Does the SLES11 vmlinux-2.6.32.10-0.4.99.25.62005-ppc64 kernel
contain this patch?:

http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8be8cf5b47f72096e42bf88cc3afff7a942a346c

  author Brian King <brking@xxxxxxxxxxxxxxxxxx>	
  	 Mon, 19 Oct 2009 05:51:34 +0000 (05:51 +0000)
  committer Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>	
         Fri, 30 Oct 2009 06:20:56 +0000 (17:20 +1100)
  commit 8be8cf5b47f72096e42bf88cc3afff7a942a346c
  tree 9adff0fa02123f48fbfa40abb55a5c01be8c2fa4
  parent 6cff46f4bc6cc4a8a4154b0b6a2e669db08e8fd2
  powerpc: Add kdump support to Collaborative Memory Manager

  When running Active Memory Sharing, the Collaborative Memory Manager (CMM)
  may mark some pages as "loaned" with the hypervisor. Periodically, the
  CMM will query the hypervisor for a loan request, which is a single signed
  value. When kexec'ing into a kdump kernel, the CMM driver in the kdump
  kernel is not aware of the pages the previous kernel had marked as "loaned",
  so the hypervisor and the CMM driver are out of sync. Fix the CMM driver
  to handle this scenario by ignoring requests to decrease the number of loaned
  pages if we don't think we have any pages loaned. Pages that are marked as
  "loaned" which are not in the balloon will automatically get switched to "active"
  the next time we touch the page. This also fixes the case where totalram_pages
  is smaller than min_mem_mb, which can occur during kdump.

  Signed-off-by: Brian King <brking@xxxxxxxxxxxxxxxxxx>
  Signed-off-by: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>

I ask because we also have an outstanding bugzilla that exhibits similar
behavior, where an abnormally small ppc64 vmcore file gets created
because there was only a single /proc/device-tree/memory@0 directory
file that showed just a small subset of the total physical memory.
Typically there are many of those "memory@xxx" directories, but in
the failing scenario, there was only one /proc/device-tree/memory@0
directory.

Anyway, there's (unproven) speculation that the kernel patch above
is related to the problem.

In any case, unfortunately, there's nothing can be done from the crash
utility's perspective. 

Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility