----- "Pavan Naregundi" <pavan@xxxxxxxxxxxxxxxxxx> wrote: > On Tue, 2010-04-20 at 09:14 -0400, Dave Anderson wrote: > > ----- "Pavan Naregundi" <pavan@xxxxxxxxxxxxxxxxxx> wrote: > > > > The cause for seek errors depends upon the type > > of dumpfile. > > > > You didn't mention which type of dumpfile the vmcore > > is, so I'll presume that it's either an ELF-format > > kdump or a compressed kdump created by makedumpfile. > > > > So presuming that it's a compressed kdump, the seek error > > most likely comes from here in read_diskdump() in diskdump.c: > > > > if ((pfn >= dd->header->max_mapnr) || !page_is_ram(pfn)) > > return SEEK_ERROR; > > > > where the requested physical address pfn values are larger > > than the max_mapnr value advertised in the header. > > > > When you do any "crash -d# ...", the dumpfile header will > > be dumped first. What does that show? > > > > Dave > > > Dave, > > Dumpfile is compressed kdump created by makedumpfile. > > header shows the following values: > max_mapnr: 32768 > block_shift: 16 > > Yes. Adding some debug printf's shows me that (pfn >= > dd->header->max_mapnr) fails. > > For example: in the first seek error, > crash: seek error: kernel virtual address: c0000000af715480 type: > "kmem_cache buffer" > > paddr: af715480 => pfn=44913 > > crash -d8 log: http://pastebin.com/qrCvyPfR > > Thanks..Pavan OK, so the compressed dumpfile has exactly 32768 pages of physical memory, or exactly 2GB. That being the case, the crash utility will fail all readmem attempts above that value, and obviously there is critical data above the artificial 2GB threshold. The question at hand is why kdump is creating a truncated dumpfile with a max_mapnr of 32768: (1) makedumpfile determines the "max_mapnr" value based upon the highest physical address found in any of the PT_LOAD segments of the /proc/vmcore file on the secondary kernel. (2) the /proc/vmcore PT_LOAD segments were pre-calculated during the primary kernel's kdump initialization phase, based upon the values found in the set of "/proc/device-tree/memory@xxx/reg" files existing in the primary kernel, where the "xxx" is the starting physical address of the memory region, and the "reg" file in that directory contains the size of the memory region. For whatever reason, those files showed a maximum of 2GB of physical memory. (If you do not use makedumpfile, and then do a "readelf -a" of the resultant vmcore file, you will see the PT_LOAD segment values.) Does the SLES11 vmlinux-2.6.32.10-0.4.99.25.62005-ppc64 kernel contain this patch?: http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8be8cf5b47f72096e42bf88cc3afff7a942a346c author Brian King <brking@xxxxxxxxxxxxxxxxxx> Mon, 19 Oct 2009 05:51:34 +0000 (05:51 +0000) committer Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> Fri, 30 Oct 2009 06:20:56 +0000 (17:20 +1100) commit 8be8cf5b47f72096e42bf88cc3afff7a942a346c tree 9adff0fa02123f48fbfa40abb55a5c01be8c2fa4 parent 6cff46f4bc6cc4a8a4154b0b6a2e669db08e8fd2 powerpc: Add kdump support to Collaborative Memory Manager When running Active Memory Sharing, the Collaborative Memory Manager (CMM) may mark some pages as "loaned" with the hypervisor. Periodically, the CMM will query the hypervisor for a loan request, which is a single signed value. When kexec'ing into a kdump kernel, the CMM driver in the kdump kernel is not aware of the pages the previous kernel had marked as "loaned", so the hypervisor and the CMM driver are out of sync. Fix the CMM driver to handle this scenario by ignoring requests to decrease the number of loaned pages if we don't think we have any pages loaned. Pages that are marked as "loaned" which are not in the balloon will automatically get switched to "active" the next time we touch the page. This also fixes the case where totalram_pages is smaller than min_mem_mb, which can occur during kdump. Signed-off-by: Brian King <brking@xxxxxxxxxxxxxxxxxx> Signed-off-by: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> I ask because we also have an outstanding bugzilla that exhibits similar behavior, where an abnormally small ppc64 vmcore file gets created because there was only a single /proc/device-tree/memory@0 directory file that showed just a small subset of the total physical memory. Typically there are many of those "memory@xxx" directories, but in the failing scenario, there was only one /proc/device-tree/memory@0 directory. Anyway, there's (unproven) speculation that the kernel patch above is related to the problem. In any case, unfortunately, there's nothing can be done from the crash utility's perspective. Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility