Re: "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash

Dave Anderson <anderson@xxxxxxxxxx> · Mon, 6 Oct 2008 11:10:39 -0400 (EDT)

----- "Kevin Worth" <kevin.worth@xxxxxx> wrote:

OK, let's skip the user-space angle for now, because I keep
forgetting that you are running with /dev/mem as the memory
source.  And there is an inconsistency with your debug output
that I cannot explain.

As I mentioned before, the /dev/mem driver has this immediate 
restriction in "drivers/char/mem.c":

  static ssize_t read_mem(struct file * file, char __user * buf,
                          size_t count, loff_t *ppos)
  {
          unsigned long p = *ppos;
          ssize_t read, sz;
          char *ptr;

          if (!valid_phys_addr_range(p, count))
                  return -EFAULT;
          ...

where for x86, it looks like this:

  static inline int valid_phys_addr_range(unsigned long addr, size_t count)
  {
          if (addr + count > __pa(high_memory))
                  return 0;

          return 1;
  }

That restricts is from reading "highmem", which is the extent
of physical memory that can be unity-mapped, which means that
the kernel can directly access it by simply adding the PAGE_OFFSET 
value to the physical address.  In your case, your PAGE_OFFSET is 
0x40000000.  With your 1G/3G split, you've got 3GB of kernel virtual 
address space that you can directly access, minus 128MB at the top that
is used for the vmalloc() address range.  (3GB - 128MB) is 0xb8000000.
Therefore, your "high_memory" maximum unity-mapped kernel virtual 
address is (0xb8000000 + PAGE_OFFSET), or in your case is 0xf8000000,
your high_memory value is 0xf8000000.

In any case, on your live system, whenever a crash utility readmem()
is done that accesses a physical address beyond 0xb8000000, it *should* 
get back the EFAULT above and fail, and therefore the crash command
making the readmem() fails.

Accordingly, when you did this on your live system:

> crash> vm -p
> PID: 32227  TASK: 47bc8030  CPU: 0   COMMAND: "crash"
>    MM       PGD      RSS    TOTAL_VM
> f7e67040  5fddfe00  63336k   67412k
>   VMA       START      END    FLAGS  FILE
> f3ed61d4   8048000   83e5000   1875  /root/crash
> VIRTUAL   PHYSICAL
> vm: read error: physical address: 10b60b000  type: "page table"

It ended up translating the first user virtual address (8048000),
requiring a page-table translation, and ended up trying to access
a page table page at physical address 0x10b60b000, which /dev/mem
did not allow, because you got a "read error".

However -- and this is what I cannot explain -- the above can also
happen on a live system when accessing vmalloc() kernel virtual space 
as well *if* any PTE or page table read to make the translation, or 
*if* the ending physical page itself, are beyond the /dev/mem restriction
(again, which should be 0xb8000000 in your case).  

So when you did this on your live system, you referenced the vmalloc
address of your custom module at address 0xf9088280, and successfully
read and displayed its contents:

> 
> crash> p modules
> modules = $2 = {
>   next = 0xf9088284,
>   prev = 0xf8842104
> }
> crash> module 0xf9088280
> struct module {
>   state = MODULE_STATE_LIVE,
>   list = {
>     next = 0xf8ff9d84,
>     prev = 0x403c63a4
>   },
>   name = "custom_lkm\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
> 0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
> 0\000\000\000\000\000\000\000\000\000\000\000\000\000",
>   mkobj = {
>     kobj = {
>       k_name = 0xf90882cc "custom_lkm",
>       name = "custom_lkm\000\000\000\000\000\000\000\000",
>       kref = {
>         refcount = {
>           counter = 3
>         }
>       },
>       entry = {
>         next = 0x403c6068,
>         prev = 0xf8ff9de4
>       },
> ...
> 

But when you did vtop of 0xf9088280, it ended up translating
to 119b98000, which is well beyond 4GB (never mind 0xb8000000), so
/dev/mem should not have been able to read it:

> 
> crash> vtop 0xf9088280
> VIRTUAL   PHYSICAL
> f9088280  119b98280
> 
> PAGE DIRECTORY: 4044b000
>   PGD: 4044b018 => 6001
>   PMD:     6e40 => 1d515067
>   PTE: 1d515440 => 119b98163
>  PAGE: 119b98000
> 
>    PTE     PHYSICAL   FLAGS
> 119b98163  119b98000  (PRESENT|RW|ACCESSED|DIRTY|GLOBAL)
> 

By any chance has the /dev/mem driver been modified on your kernel?

In any case, I can't explain why you are apprently able to access 
physical addresses beyond your "high_memory"?  an. 

Anyway, the ext3 translation is useless without the accompanying "vtop":

> 
> crash> mod | grep ext3
> f88c8000  ext3             132616  (not loaded)  [CONFIG_KALLSYMS]
> 
> ... [ snip ] ...
> 
> (Realized afterward that I forgot to vtop ext3. Let me know if it's needed and I can repeat this procedure)
> 

And the "bash" vm output only makes sense with respect to
its output on the live system:

> >From dump file:
> 
> crash> vm
> PID: 4323   TASK: 47be0a90  CPU: 0   COMMAND: "bash"
>    MM       PGD      RSS    TOTAL_VM
> 5d683580  5d500dc0  2616k    3968k
>   VMA       START      END    FLAGS  FILE
> 5fc2aac4   8048000   80ee000   1875  /bin/bash
> 5fe5f0cc   80ee000   80f3000 101877  /bin/bash
> ...
> 
> 
> crash> vm -p
> PID: 4323   TASK: 47be0a90  CPU: 0   COMMAND: "bash"
>    MM       PGD      RSS    TOTAL_VM
> 5d683580  5d500dc0  2616k    3968k
>   VMA       START      END    FLAGS  FILE
> 5fc2aac4   8048000   80ee000   1875  /bin/bash
> VIRTUAL   PHYSICAL
> 8048000   FILE: /bin/bash  OFFSET: 0
> 8049000   FILE: /bin/bash  OFFSET: 1000
> 804a000   FILE: /bin/bash  OFFSET: 2000
> ...no errors, lots of output
> 

But getting back to vmalloc'd module space, your access of the module
at vmalloc-address-f9088280/physical-address-119b98000 showed that 
it's getting back a page of zeroes, while accessing the same physical
address (0x119b98000) the you successfully read (but how?) on the live
system:

> 
> crash> modules
> modules = $2 = {
>   next = 0xf9088284,
>   prev = 0xf8842104
> }
> 
> crash> module 0xf9088280
> struct module {
>   state = MODULE_STATE_LIVE,
>   list = {
>     next = 0x0,
>     prev = 0x0
>   },
>   name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000",
>   mkobj = {
>     kobj = {
>       k_name = 0x0,
>       name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000",
>       kref = {
>         refcount = {
>           counter = 0
>         }
>       },
>       entry = {
>         next = 0x0,
>         prev = 0x0
> ...
> 
> crash> vtop 0xf9088280
> VIRTUAL   PHYSICAL
> f9088280  119b98280
> 
> PAGE DIRECTORY: 4044b000
>   PGD: 4044b018 => 6001
>   PMD:     6e40 => 1d515067
>   PTE: 1d515440 => 119b98163
>  PAGE: 119b98000
> 
>    PTE     PHYSICAL   FLAGS
> 119b98163  119b98000  (PRESENT|RW|ACCESSED|DIRTY|GLOBAL)
> 
>   PAGE     PHYSICAL   MAPPING    INDEX CNT FLAGS
> 47337300  119b98000         0         0  1 80000000

And so even though I'd like to point out that analogous readmem()
on the dumpfile reads the same physical location -- and seems to
just return zeroes -- is not enough for me to simply state that
it's a problem with kexec/kdump.

Because, again, I cannot explain how you are able to access 
physical address 0x119b98000 from /dev/mem on your live 
system?

Can you check whether your kernel source has modified
the read_mem() or valid_phys_addr_range() functions?
If they unchanged from what I showed above (from 2.6.20),
then I'm stumped, because it makes no sense to me how you
can read from those physical addresses on your live system.

For verification, if you do this:

  crash> p high_memory

it should show 0xf8000000.  If you then do a vtop of 0xf8000000,
it will simply end up stripping off the PAGE_OFFSET of 0x40000000, 
resulting in the maximum-accessible physical address of 0xb8000000.
And if you can do this:

  crash> rd -p 0xb8000000

it should fail -- as should any address equal to or above it.
But your output above that translates the module vmalloc
addresses seemingly reads physical addresses well beyond the
4GB (0x100000000).  And that's what I cannot begin to explain.

So I'm running out of ideas here...

One thing I can suggest is to rebuild your kexec-tools package
that you're using, and correct the PAGE_OFFSET value to equal
your system's.  The version of "kexec/arch/i386/crashdump-x86.h"
that we (Red Hat) are using looks like this: 

  #ifndef CRASHDUMP_X86_H
  #define CRASHDUMP_X86_H

  struct kexec_info;
  int load_crashdump_segments(struct kexec_info *info, char *mod_cmdline,
                                  unsigned long max_addr, unsigned long min_base);

  #define PAGE_OFFSET     0xc0000000
  #define __pa(x)         ((unsigned long)(x)-PAGE_OFFSET)

  #define __VMALLOC_RESERVE       (128 << 20)
  #define MAXMEM                  (-PAGE_OFFSET-__VMALLOC_RESERVE)

  #define CRASH_MAX_MEMMAP_NR     (KEXEC_MAX_SEGMENTS + 1)
  #define CRASH_MAX_MEMORY_RANGES (MAX_MEMORY_RANGES + 2)

  /* Backup Region, First 640K of System RAM. */
  #define BACKUP_SRC_START        0x00000000
  #define BACKUP_SRC_END          0x0009ffff
  #define BACKUP_SRC_SIZE (BACKUP_SRC_END - BACKUP_SRC_START + 1)

  #endif /* CRASHDUMP_X86_H */

Try rebuilding your package with PAGE_OFFSET defined as 0x40000000,
and then see what happens.

Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility