Re: is_page_ptr vs. x86_64_kvtop

Dave Anderson <anderson@xxxxxxxxxx> · Mon, 18 Mar 2013 11:29:39 -0400 (EDT)

----- Original Message -----
> Hi Dave,
> 
> On 03/15/13 07:07, Dave Anderson wrote:
> >> extension working again.  It used to work, but does no more.
> >> It first calls is_page_ptr(kvaddr, &kpaddr) to convert a virtual
> >> address into a physical address, and then calls:
> >>
> >>>    readmem(kpaddr, PHYSADDR, buf, used,
> >>> 	   "trace page data", RETURN_ON_ERROR)
> >>
> >> to fetch the bytes.  Updating the release to SLES-11 SP2 causes
> >> this to now fail.
> > 
> > So are you saying that it works with an earlier kernel version?
> 
> Yep.  My first guess on this is that there is some different classification
> of the memory and that the new classification is not selected by the
> crash dump.

By classification, do you mean which bit in the filtering option
of makedumpfile?

> 
> >> Help, please?  Thank you!
> > 
> > It is translating the vmemmap'ed kernel address to a physical address
> > by walking the page tables, and finding it in a 2MB big-page.
> > If you skip the is_page_ptr() qualifier, does this work, and
> > if so, does it look like a legitimate page structure?:
> 
> It is both a qualifier and a translator-to-phys page.
> I'll have to do some research on how to invoke readmem with the virtual
> address instead of physical address.  Eventually, they must all fold back
> into crash's memory.c readmem() function.

What's to research?  The readmem() function simply takes either a virtual or
physical address along with the proper "memtype" argument (KVADDR, PHYSADDR,
UVADDR, etc.).  If you turn on "set debug 4", you can see all readmem() calls.

> 
> Per your request:
> 
> > crash> struct page 0xffffea001cdad420
> > struct struct page {
> >   flags = 0x200000000000000,
> >   _count = {  counter = 0x1 },
> >   {
> >     _mapcount = {  counter = 0xffffffff },
> >     { inuse = 0xffff, objects = 0xffff }
> >   },
> >   {
> >     { private = 0x0, mapping = 0x0 },
> >     ptl = {
> >       { rlock = { raw_lock = { slock = 0x0 } } }
> >     },
> >     slab = 0x0,
> >     first_page = 0x0
> >   },
> >   {
> >     index = 0xffff88067b39a400,
> >     freelist = 0xffff88067b39a400,
> >     pfmemalloc = 0x0
> >   },
> >   lru = {
> >     next = 0xdead000000100100,
> >     prev = 0xdead000000200200
> >   }
> > }

OK, looks like a page struct (most likely)...

> 
> > But the sparsemem stuff doesn't seem to be accepting it as a  vmemmap
> > page struct address.  Does "kmem -p" include physical address 0x87afad420?
> > For example, on my system, the last physical page mapped in the
> > vmmemap is 21ffff000:
> > 
> >  crash> kmem -p | tail
> 
> OK, here's mine, along with the closest page numbers:
> 
> >       PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
> > [...]
> > ffffea000e6ffee8 41fffb000                0    c5600  0 200000000000000
> > ffffea000e6fff20 41fffc000                0    c5600  0 200000000000000
> > ffffea000e6fff58 41fffd000                0    c5600  0 200000000000000
> > ffffea000e6fff90 41fffe000                0    c5600  0 200000000000000
> > ffffea000e6fffc8 41ffff000                0    c5600  0 200000000000000
> <<no 0xffffea001cdad420 entry, the next line is:>>
> > ffffea56189f2488 189120000000                0        0  0 0
> > ffffea56189f24c0 189120001000                0        0  0 0
> > ffffea56189f24f8 189120002000                0        0  0 0
> > ffffea56189f2530 189120003000                0        0  0 0
> > [...]
> > ffffea64e939b648 1cc4b7ffc000                0        0  0 0
> > ffffea64e939b680 1cc4b7ffd000                0        0  0 0
> > ffffea64e939b6b8 1cc4b7ffe000                0        0  0 0
> > ffffea64e939b6f0 1cc4b7fff000                0        0  0 0
> <<fin>>

Wow, that system has physical memory installed at an unusually high
physical address location, i.e., where 1cc4b7fff000 is up around
28 terabytes?  

I'd be interested in seeing a dump of "kmem -n".  In your case the output
is probably huge, but the top part would reflect the physical memory layout,
and the bottom part would show all of the individual memory sections and their
starting vmemmap addresses:

 crash> kmem -n 
 NODE    SIZE      PGLIST_DATA       BOOTMEM_DATA       NODE_ZONES   
   0   2221552   ffff88021e5ec000        ----        ffff88021e5ec000
                                                     ffff88021e5ec6c0
                                                     ffff88021e5ecd80
                                                     ffff88021e5ed440
     MEM_MAP          START_PADDR    START_MAPNR
 ffffea0000000400        10000            16    

 ZONE  NAME         SIZE       MEM_MAP      START_PADDR  START_MAPNR
   0   DMA          4080  ffffea0000000400        10000           16
   1   DMA32     1044480  ffffea0000040000      1000000         4096
   2   Normal    1172992  ffffea0004000000    100000000      1048576
   3   Movable         0                 0            0            0

-------------------------------------------------------------------

 NR      SECTION        CODED_MEM_MAP        MEM_MAP       PFN
  0  ffff88021e5eb000  ffffea0000000000  ffffea0000000000  0               
  1  ffff88021e5eb020  ffffea0000000000  ffffea0000200000  32768           
  2  ffff88021e5eb040  ffffea0000000000  ffffea0000400000  65536           
  3  ffff88021e5eb060  ffffea0000000000  ffffea0000600000  98304           
  4  ffff88021e5eb080  ffffea0000000000  ffffea0000800000  131072          
  5  ffff88021e5eb0a0  ffffea0000000000  ffffea0000a00000  163840 
 ...

So your target page structure should "fit" into one of the
sections above, where the starting MEM_MAP address of each
section should have a contiguous array of page structs that
reference the array of physical pages starting at the "PFN"
value.  Those MEM_MAP addresses are typically increasing in
value with each section, but I believe that I have seen cases
where they are not.  And they shouldn't have to be, each section
has a base vmemmap address for some number of PFN/physical-pages.

Anyway, it does looks like a page structure, and the page structure pointer
itself is translatable.  The problem at hand is that the physical address 
that the page structure refers to is not being determined because the page
structure address itself is not being recognized by is_page_ptr() as being
part of the sparsemem infrastructure.  The "if IS_SPARSEMEM()" section at
the top of is_page_ptr() is returning FALSE.

That being said, from your target page structure address and the "kmem -n"
output, you could presumably calculate the associated physical address.
For each pfn, there would be an array of page structs starting at
the base MEM_MAP address.

> 
> > Anyway, the first thing that needs to be done is to verify that
> > the the SECTION_SIZE_BITS and MAX_PHYSMEM_BITS are being setup
> > correctly.  The upstream kernel currently has:
> > 
> >  # define SECTION_SIZE_BITS      27 /* matt - 128 is convenient
> >  right now */
> >  # define MAX_PHYSADDR_BITS      44
> >  # define MAX_PHYSMEM_BITS       46
> 
> That is what linux-3.0.13-0.27 has for x86-64, too.
> 
> >  crash> help -m | grep -e section -e physmem
> >    section_size_bits: 27
> >     max_physmem_bits: 46
> >    sections_per_root: 128
> >  crash>
> 
> Matches my output.  Is there a way to coerce readelf to tell me anything about
> the crash dump?

Is it an ELF core dump?  If so, it would show the individual
PT_LOAD segments defining the core dump's physical memory
contents.  

> If you are curious to look at the actual dump, I can tell
> you how to get it via ftp (offline).  The extension is on github:
> 
> git clone git://github.com/brkorb/lustre-crash-tools.git
> 
> and cr-ext/lustre-ext.c is the the one.

The extension is not of prime importance, but rather how the sparsemem
data structures are being handled and therefore why is_page_ptr() is not
recognizing the vmmemap'd page pointer as a legitimate page struct pointer.

> The memory in question is probably not in the dump, but I don't know how
> to specify that it be added to the dump without knowing how the memory
> is characterized.

The actual physical page that is referenced by your target page structure
is in the dumpfile should not affect the is_page_ptr() function.  That 
should work regardless. 

If the dumpfile is reasonably sized, you can send me a pointer to it
offline.

Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility