----- Original Message ----- > Hi Dave, > > On 2/20/2018 11:32 AM, Dave Anderson wrote: > ... > >>>>> Another suggestion/question -- if is_page_ptr() is called with a NULL > >>>>> phys > >>>>> argument (as is done most of the time), could it skip the "if > >>>>> IS_SPARSEMEM()" > >>>>> section at the top, and still utilize the part at the bottom, where it > >>>>> walks > >>>>> through the vt->node_table[x] array? I'm not sure about the "ppend" > >>>>> calculation > >>>>> though -- even if there are holes in the node's address space, is it > >>>>> still > >>>>> a > >>>>> contiguous chunk of page structure addresses per-node? > >>>> > >>>> I'm still investigating and not sure yet, but I think that SPASEMEM uses > >>>> mem_section instead of node_mem_map means page structures could be > >>>> non-contignuous per-node according to architecture or condition. > >>>> > >>>> typedef struct pglist_data { > >>>> ... > >>>> #ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */ > >>>> struct page *node_mem_map; > >>>> > >>>> I'll continue to check it. > >>> > >>> You are right, but in the case where pglist_data.node_mem_map does *not* > >>> exist, > >>> the crash utility initializes each vt->node_table[node].mem_map with the > >>> node's > >>> starting mem_map address by using the return value from phys_to_page() of > >>> the > >>> node's starting physical address -- which uses the sparsemem functions. > >>> > >>> The question is whether the current "ppend" calculation is correct for > >>> the > >>> last > >>> physical page in a node. If it is not correct, then perhaps an > >>> "mem_map_end" value > >>> can be added to the node_table structure, initialized by using > >>> phys_to_page() to get > >>> the page address of the last physical address in the node. And then in > >>> that case, the > >>> question is whether the mem_map range of virtual addresses are contiguous > >>> -- even if > >>> there are holes in the mem_map virtual address range. > >> > >> "node_size" is set to pglist_data.node_spanned_pages, which includes > >> holes. > >> So I think that if VMEMMAP, which a page address is linear against its > >> pfn, > >> the current "ppend" calculation is correct for the last page in a node. > >> But if not VMEMMAP, since there is no guarantee of the linearity, the > >> calculation could be incorrect. > >> > >> I found an example with RHEL5: > >> > >> crash> help -o > >> ... > >> size_table: > >> page: 56 > >> ... > >> crash> kmem -n > >> NODE SIZE PGLIST_DATA BOOTMEM_DATA NODE_ZONES > >> 0 524279 ffff810000014000 ffffffff804e1900 ffff810000014000 > >> ffff810000014b00 > >> ffff810000015600 > >> ffff810000016100 > >> MEM_MAP START_PADDR START_MAPNR > >> ffff8100007da000 0 0 > >> > >> ZONE NAME SIZE MEM_MAP START_PADDR START_MAPNR > >> 0 DMA 4096 ffff8100007da000 0 0 > >> 1 DMA32 520183 ffff810000812000 1000000 4096 > >> 2 Normal 0 0 0 0 > >> 3 HighMem 0 0 0 0 > >> > >> ------------------------------------------------------------------- > >> > >> NR SECTION CODED_MEM_MAP MEM_MAP PFN > >> 0 ffff810009000000 ffff8100007da000 ffff8100007da000 0 > >> 1 ffff810009000008 ffff8100007da000 ffff81000099a000 32768 > >> 2 ffff810009000010 ffff8100007da000 ffff810000b5a000 65536 > >> 3 ffff810009000018 ffff8100007da000 ffff810000d1a000 98304 <= there > >> is a > >> 4 ffff810009000020 ffff810008901000 ffff810009001000 131072 <= > >> mem_map gap. > >> 5 ffff810009000028 ffff810008901000 ffff8100091c1000 163840 > >> : > >> 14 ffff810009000070 ffff810008901000 ffff81000a181000 458752 > >> 15 ffff810009000078 ffff810008901000 ffff81000a341000 491520 > >> crash> > >> > >> In this case, the "ppend" will be > >> > >> 0xffff8100007da000 + (524279 * 56) > >> = 0xffff8100023d9e08 > >> > >> but it looks like the actual value is around 0xffff81000a501000. > > > > Right, I understand that the current "ppend" calculation wouldn't work. > > > >> And also, we can see the gap between NR=3 and 4. This means that if the > >> correct "mem_map_end" is added to the node_table structure, it would be > >> not enough to check whether an address is a page structure. > > > > Why? Wouldn't it still give us an ascending range of page structure > > addresses > > on a per-node basis? (even if there was a physical and/or virtual memory > > hole?) > > AFAICT, for each section NR, the MEM_MAP and PFN values always increment. > > Sorry if I misunderstood something.. > First, I assume that we are talking about the case of kernels with SPARSEMEM > and using the vm->numnodes loop after skipping the IS_SPARSEMEM() section. > > The "mem_map_end" I mean here is the page address of the last physical > address in the node, and the example system has only one node. So I think > that the "kmem -n" output above suggests that it could return TRUE for an > incoming "addr" between the end of NR=3 and the start of NR=4, but it's > not a page address. > > NR MEM_MAP > 0 +---------+ ffff8100007da000 = nt->mem_map > : | pages.. | : > 2 +---------+ ffff810000b5a000 > 3 +---------+ ffff810000d1a000 > +---------+ ffff810000eda000 = ffff810000d1a000 + (32768 * 56) > | ??? | <-- for an "addr" here, it could returns TRUE. > 4 +---------+ ffff810009001000 > 5 +---------+ ffff8100091c1000 > : | pages.. | : > 15 +---------+ ffff81000a341000 > +---------+ ffff81000a501000 = nt->mem_map_end > > Because of such mem_map holes in a node, I don't think that the vm->numnodes > loop could be utilized for kernels with SPARSEMEM as it is. > Is this "mem_map_end" different from the one you assumed? No. I understand that a page address in the "???" section above would return true (unless a "phys" argument was passed in). Checking whether an incoming address was between nt->mem_map and nt->mem_map_end would be slightly more refined as compared to adding a new simple function that would check whether the incoming address was between VMEMMAP_VADDR and VMEMMAP_END, which we discussed earlier. So I'm suggesting that a vmemmap page address could be checked for validity by: (1) verifying that the incoming address is located in the vmemmap address range, and (2) it is accessible() Dave > > Thanks, > Kazuhito Hagio > > -- > Crash-utility mailing list > Crash-utility@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/crash-utility > -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility