RE: "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash

"Worth, Kevin" <kevin.worth@xxxxxx> · Tue, 14 Oct 2008 15:54:45 +0000

Thanks, Dave. I was actually going to ask about that since you had mentioned that /dev/crash should behave correctly in contrast to /dev/mem / /dev/kmem. Does it matter that my arch is i386 and not x86_64? I'll give this a shot.

-Kevin
________________________________________
From: crash-utility-bounces@xxxxxxxxxx [crash-utility-bounces@xxxxxxxxxx] On Behalf Of Dave Anderson [anderson@xxxxxxxxxx]
Sent: Tuesday, October 14, 2008 8:27 AM
To: Discussion list for crash utility usage,    maintenance and development
Subject: Re:  "cannot access vmalloc'd module memory" when       loading kdump'ed vmcore in crash

Worth, Kevin wrote:
 > Thanks for the explanation, Dave.
 >
 > I tried adding some printf's to memory.c (before the goto try_dev_kmem, then
 > before and after the call to read_dev_kmem). It does in fact look like it's
 > taking the read_dev_kmem path... a couple examples (a ton of these are printed,
 > so if there is any specific address to look for let me know, but I presume this
 > is just confirming what you suspected.

...

 > Yep. Also confirmed using the printf's in previous email.
 >
 > crash> help -p
 >      program_name: crash
 >      program_path: ./crash
 >   program_version: 4.0-7.2
 >       gdb_version: 6.1
 > ...
 >               nfd: -1
 >               mfd: 4
 >               kfd: 11

OK, we're getting nowhere fast because of the limitations of
the /dev/mem driver.

What I've been trying to accomplish is simply this:

   on the live system:
     - find a module address whose "vtop" shows that its physical
       address is greater than 4GB.
     - rd -p <the-physical-address-greater-than-4GB> 100
     - verify that it contains "correct" data, i.e. as seen when
       you do a "p <module-virtual-address>"  (it will...)

   <crash the system>

   on the dumpfile:
     - rd -p <the-physical-address-greater-than-4GB> 100 (same as above)
     - see whether it does -- or does *not* -- contain the "correct" data
       that was seen on the live system

If the same physical address does *not* contain the same data
in the dumpfile as was there in the live system, then we can
point at the kexec/kdump operation.  We have seemingly done
so, but without being able to do it *exactly* as the steps
above because:

-  on the live system, we're relying on /dev/kmem to do the
    virtual-to-physical address translation of vmalloc addresses,
    so the resultant physical address is "hidden" from us.

If your system used the Red Hat /dev/crash driver, then the
steps above would be trivial, because the crash utility does
does the virtual-to-physical address translation of the
vmalloc addresses itself, and then reads memory using the
resultant physical address.

I will attach the crash.c and crash.h files from /dev/crash patch
we use for RHEL5's 2.6.18-based kernel.  What you would need to do
is something like this (having never done it before):

(1) Copy the attached crash.c file to your kernel's "drivers/char" directory.
(2) Copy the attached crash.h file to your kernel's "include/asm-x86_64" directory.
(3) Add this to the "drivers/char/Makefile":

+obj-m            += crash.o

(4) since 2.6.20 doesn't have a page_is_ram() function for x86_64,
     you'll have to add this to your arch/x86_64/mm/init.c file,
     and export it so the /dev/crash driver can pick it up:

int page_is_ram (unsigned long pagenr)
{
         int i;

         for (i = 0; i < e820.nr_map; i++) {
                 unsigned long addr, end;

                 if (e820.map[i].type != E820_RAM)       /* not usable memory */
                         continue;
                 /*
                  * !!!FIXME!!! Some BIOSen report areas as RAM that
                  * are not. Notably the 640->1Mb area. We need a sanity
                  * check here.
                  */
                 addr = (e820.map[i].addr+PAGE_SIZE-1) >> PAGE_SHIFT;
                 end = (e820.map[i].addr+e820.map[i].size) >> PAGE_SHIFT;
                 if  ((pagenr >= addr) && (pagenr < end))
                         return 1;
         }
         return 0;
}

EXPORT_SYMBOL_GPL(page_is_ram);

(5) Build the kernel and see what happens...

Other than trying that, I don't have any other suggestions.

Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility