Dne Út 10. ledna 2012 20:24:58 Dave Anderson napsal(a): > ----- Original Message ----- > > > Hi folks, > > > > I've just discovered that the crash utility fails to initialize the vm > > subsystem properly on our latest SLES 32-bit kernels. It turns out that > > our kernels are compiled with CONFIG_DISCONTIGMEM=y, which causes pgdat > > structs to be allocated by the remap allocator (cf. > > arch/x86/mm/numa_32.c and also the code in setup_node_data). > > > > If you don't know what the remap allocator is (like I didn't before I hit > > the bug), it's a very special early-boot allocator which remaps physical > > pages from low memory to high memory, giving them virtual addresses from > > the > > > > identity mapping. Looks a bit like this: > > physical addr > > > > +------------+ > > > > +------------+ > > > > +--> | KVA RAM | > > > > | +------------+ > > | > > | \/\/\/\/\/\/\/ > > | /\/\/\/\/\/\/\ > > > > virtual addr | | highmem | > > > > +------------+ | |------------| > > > > | | -----> | | > > > > +------------+ | +------------+ > > > > | remap va | --+ | KVA PG | (unused) > > > > +------------+ +------------+ > > > > | | -----> | RAM bottom | > > > > +------------+ +------------+ > > > > This breaks a very basic assumption that crash makes about low-memory > > virtual addresses. > > Hmmm, yeah, I am also unaware of this, and I'm not entirely clear based > upon your explanation. What do "KVA PG" and "KVA RAM" mean exactly? And > do just the pgdat structures (which I know can be huge) get moved from low > to high physical memory (per-node perhaps), and then remapped with mapped > virtual addresses? Well, the concept dates back to Martin Bligh's patch in 2002 which added this for NUMA-Q. My understanding is that "KVA PG" refers to the kernel virtual addresses used to access the pgdat array as well as to the physical memory that corresponds to these virtual addresses if they were identity-mappe. This physical memory is then inaccessible. "KVA RAM", on the other hand, is where the pgdat structures are actually stored. Please note that there is no "moving" of the structures, because this remapping occurs when memory nodes are initialized, i.e. before any access to it. Regarding your second question, anything can theoretically call alloc_remap() to allocate memory from this region, but nothing does, and by looking at init_alloc_remap(), the size of the pool is always calculated as the size of the pgdat array plus struct pglist_data, rounded up to a multiple of 2MB (so that large pages can be used), so there's really only room for pgdat. > Anyway, I trust you know what you're doing... Thank you for the trust. > > The attached patch fixes the issue for me, but may not be the cleanest > > method to handle these mappings. > > Anyway, what I can't wrap my head around is that the initialization > sequence is being done by the first call to x86_ktop_PAE(), which calls > x86_kvtop_remap(), which calls initialize_remap(), which calls readmem(), > which calls x86_kvtop_PAE(), starting the whole thing over again. How > does that recursion work? Would it be possible to call initialize_remap() > earlier on instead of doing it upon the first kvtop() call? Agreed. My thinking was that each node has its own remap region, so I want to know the number of nodes first. Since I didn't want to duplicate the heuristics used to determine the number of nodes, I couldn't initialize before vm_init. Then again, the remap mapping is accessed before vm_init() finishes. I can see now that this is unnecessarily complicated, because the node_remap_* variables are static arrays of MAX_NUMNODES elements, so I can get their size from the debuginfo at POST_GDB init and initialize a machine-specific data type with it. I'll post another patch tomorrow. Thanks for the hint! Petr Tesarik SUSE Linux > > Ken'ichi Ohmichi, please note that makedumpfile is also affected by this > > deficiency. On my test system, it will fail to produce any output if I > > set dump level to anything greater than zero: > > > > makedumpfile -c -d 31 -x vmlinux-3.0.13-0.5-pae.debug vmcore kdump.31 > > readmem: Can't convert a physical address(34a012b4) to offset. > > readmem: type_addr: 0, addr:f4a012b4, size:4 > > get_mm_discontigmem: Can't get node_start_pfn. > > > > makedumpfile Failed. > > > > However, fixing this for makedumpfile is harder, and it will most likely > > require a few more lines in VMCOREINFO, because debug symbols may not be > > available at dump time, and I can't see any alternative method to locate > > the remapped regions. > > > > Regards, > > Petr Tesarik > > SUSE Linux > > -- > Crash-utility mailing list > Crash-utility@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/crash-utility -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility