Re: x86 remap allocator in kernel 3.0

Dave Anderson <anderson@xxxxxxxxxx> · Tue, 10 Jan 2012 14:24:58 -0500 (EST)

----- Original Message -----
> Hi folks,
> 
> I've just discovered that the crash utility fails to initialize the vm
> subsystem properly on our latest SLES 32-bit kernels. It turns out that our
> kernels are compiled with CONFIG_DISCONTIGMEM=y, which causes pgdat structs to
> be allocated by the remap allocator (cf. arch/x86/mm/numa_32.c and also the
> code in setup_node_data).
> 
> If you don't know what the remap allocator is (like I didn't before I hit the
> bug), it's a very special early-boot allocator which remaps physical pages
> from low memory to high memory, giving them virtual addresses from the
> identity mapping. Looks a bit like this:
> 
>                         physical addr
>                        +------------+
>                        |            |
>                        +------------+
>                   +--> |  KVA RAM   |
>                   |    +------------+
>                   |    |            |
>                   |    \/\/\/\/\/\/\/
>                   |    /\/\/\/\/\/\/\
>                   |    |            |
>   virtual addr    |    |  highmem   |
>  +------------+   |    |------------|
>  |            | -----> |            |
>  +------------+   |    +------------+
>  |  remap va  | --+    |   KVA PG   | (unused)
>  +------------+        +------------+
>  |            |        |            |
>  |            | -----> | RAM bottom |
>  +------------+        +------------+
> 
> This breaks a very basic assumption that crash makes about low-memory virtual
> addresses.

Hmmm, yeah, I am also unaware of this, and I'm not entirely clear based upon
your explanation.  What do "KVA PG" and "KVA RAM" mean exactly?  And do just
the pgdat structures (which I know can be huge) get moved from low to high
physical memory (per-node perhaps), and then remapped with mapped virtual 
addresses?

Anyway, I trust you know what you're doing...

> 
> The attached patch fixes the issue for me, but may not be the cleanest method
> to handle these mappings.

Anyway, what I can't wrap my head around is that the initialization sequence
is being done by the first call to x86_ktop_PAE(), which calls x86_kvtop_remap(),
which calls initialize_remap(), which calls readmem(), which calls x86_kvtop_PAE(),
starting the whole thing over again.  How does that recursion work?  Would it be
possible to call initialize_remap() earlier on instead of doing it upon the first
kvtop() call?

Dave

> 
> Ken'ichi Ohmichi, please note that makedumpfile is also affected by this
> deficiency. On my test system, it will fail to produce any output if I set
> dump level to anything greater than zero:
> 
> makedumpfile -c -d 31 -x vmlinux-3.0.13-0.5-pae.debug vmcore kdump.31
> readmem: Can't convert a physical address(34a012b4) to offset.
> readmem: type_addr: 0, addr:f4a012b4, size:4
> get_mm_discontigmem: Can't get node_start_pfn.
> 
> makedumpfile Failed.
> 
> However, fixing this for makedumpfile is harder, and it will most likely
> require a few more lines in VMCOREINFO, because debug symbols may not be
> available at dump time, and I can't see any alternative method to locate the
> remapped regions.
> 
> Regards,
> Petr Tesarik
> SUSE Linux
>

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility