Re: Request for ppc64 help from IBM

Dave Anderson <anderson@xxxxxxxxxx> · Tue, 15 Dec 2009 11:08:52 -0500 (EST)

----- "Dave Anderson" <anderson@xxxxxxxxxx> wrote:

> Somewhere between the RHEL5 (2.6.18-based) and RHEL6 timeframe,
> the ppc64 architecture has started using a virtual memmap scheme
> for the arrays of page structures used to describe/handle
> each physical page of memory.

... [ snip ] ...

> So my speculation (guess?) is that the ppc64.c ppc64_vtop()
> function needs updating to properly translate these addresses.
> 
> Since the ppc64 stuff in the crash utility was written by, and
> has been maintained by IBM (and since I am ppc64-challenged),
> can you guys take a look at what needs to be done?

[ sound of crickets... ]

Well that request apparently fell on deaf ears...

Here's my understanding of the situation.

In 2.6.26 the ppc64 architecture started using a new kernel virtual
memory region to map the kernel's page structure array(s), so that
now there are three kernel virtual memory regions:

  KERNEL   0xc000000000000000
  VMALLOC  0xd000000000000000
  VMEMMAP  0xf000000000000000

The KERNEL region is the unity-mapped region, where the underlying
physical address can be determined by manipulating the virtual address
itself.  

The VMALLOC region requires a page-table walk-through to find
the underlying physical address in a PTE.

The new VMEMMAP region is mapped in ppc64 firmware, where a
physical address of a given size is mapped to a VMEMMAP virtual 
address.  So for example, the page structure for physical page 0 
is at VMEMMAP address 0xf000000000000000, the page for physical 
page 1 is at f000000000000068, and so on.  Once mapped in the
firmware TLB (?) the virtual-to-physical translation is done
automatically while running in kernel mode.

The problem is that the physical-to-vmemmap address/size mapping
information is not stored in the kernel proper, so there is
no way for the crash utility to make the translation.  That
being the case, any crash command that needs to read the contents
of any page structure will fail.

The kernel mapping is performed here in 2.6.26 through 2.6.31:

  int __meminit vmemmap_populate(struct page *start_page,
                                 unsigned long nr_pages, int node)
  {
          unsigned long start = (unsigned long)start_page;
          unsigned long end = (unsigned long)(start_page + nr_pages);
          unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;

          /* Align to the page size of the linear mapping. */
          start = _ALIGN_DOWN(start, page_size);

          for (; start < end; start += page_size) {
                  int mapped;
                  void *p;

                  if (vmemmap_populated(start, page_size))
                          continue;

                  p = vmemmap_alloc_block(page_size, node);
                  if (!p)
                          return -ENOMEM;

                  pr_debug("vmemmap %08lx allocated at %p, physical %08lx.\n",
                          start, p, __pa(p));

                  mapped = htab_bolt_mapping(start, start + page_size, __pa(p),
                                             pgprot_val(PAGE_KERNEL),
                                             mmu_vmemmap_psize, mmu_kernel_ssize);
                  BUG_ON(mapped < 0);
          }

          return 0;
  } 

So if the pr_debug() statement is turned on, it shows on my test system:

  vmemmap f000000000000000 allocated at c000000003000000, physical 03000000

This would make for an extremely simple virtual-to-physical translation
for the crash utility, but note that neither the unity-mapped virtual address
of 0xc000000003000000 nor its associated physical address of 0x3000000 are
stored anywhere, since "p" is a stack variable.  The htab_bolt_mapping()
function does not store the mapping information in the kernel either, it
just uses temporary stack variables before calling the ppc_md.hpte_insert()
function which eventually leads to a machine-dependent (directly to firmware)
function.  

So unless I'm missing something, nowhere along the vmemmap call-chain are the 
VTOP address/size particulars stored anywhere -- say for example, in a 
/proc/iomem-like "resource" data structure.

(FWIW, I note that in 2.6.32, CONFIG_PPC_BOOK3E arches still use the normal page
tables to map the memmap array(s).  I don't know whether BOOK3E arch is the
most common or not...)

In any case, not being able to read the page structure contents has a
significant effect on the crash utility.  This is about the only thing
that can be done for these kernels, where a warning gets printed during
initialization, and any command that attempts to read a page structure
will subsequently fail:

  # crash vmlinux vmcore

  crash 4.1.2p1
  Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006  IBM Corporation
  Copyright (C) 1999-2006  Hewlett-Packard Co
  Copyright (C) 2005, 2006  Fujitsu Limited
  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
  Copyright (C) 2005  NEC Corporation
  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
  This program is free software, covered by the GNU General Public License,
  and you are welcome to change it and/or distribute copies of it under
  certain conditions.  Enter "help copying" to see the conditions.
  This program has absolutely no warranty.  Enter "help warranty" for details.

  GNU gdb 6.1
  Copyright 2004 Free Software Foundation, Inc.
  GDB is free software, covered by the GNU General Public License, and you are
  welcome to change it and/or distribute copies of it under certain conditions.
  Type "show copying" to see the conditions.
  There is absolutely no warranty for GDB.  Type "show warranty" for details.
  This GDB was configured as "powerpc64-unknown-linux-gnu"...

  WARNING: cannot translate vmemmap kernel virtual addresses:
           commands requiring page structure contents will fail

        KERNEL: vmlinux                        
      DUMPFILE: vmcore
          CPUS: 2
          DATE: Thu Dec 10 05:40:35 2009
        UPTIME: 21:44:59
  LOAD AVERAGE: 0.11, 0.03, 0.01
         TASKS: 196
      NODENAME: ibm-js20-04.lab.bos.redhat.com
       RELEASE: 2.6.31-38.el6.ppc64
       VERSION: #1 SMP Sun Nov 22 08:15:30 EST 2009
       MACHINE: ppc64  (unknown Mhz)
        MEMORY: 2 GB
         PANIC: "Oops: Kernel access of bad area, sig: 11 [#1]" (check log for details)
           PID: 10656
       COMMAND: "runtest.sh"
          TASK: c000000072156420  [THREAD_INFO: c000000072058000]
           CPU: 0
         STATE: TASK_RUNNING (PANIC)

  crash> kmem -i
  kmem: cannot translate vmemmap address: f000000000000000
  crash> kmem -p
        PAGE       PHYSICAL      MAPPING       INDEX CNT FLAGS
  kmem: cannot translate vmemmap address: f000000000000000
  crash> kmem -s
  CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
  kmem: cannot translate vmemmap address: f00000000030db44
  crash> 

Can any of the IBM engineers on this list (or any ppc64 user)
confirm my findings?  Maybe I'm missing something, but I don't
see it.

And if you agree, perhaps you can work on an upstream solution to
store the vmemmap-to-physical data information?

Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility