----- Original Message ----- > Hi Dave, > > On 2/16/2018 4:18 PM, Dave Anderson wrote: > ... > >>>> OK, I understand your point. But what concerns me is that the > >>>> function's > >>>> purpose is to absolutely identify whether the incoming page structure > >>>> address > >>>> is a correct page structure address. But if an invalid address gets > >>>> passed > >>>> into is_page_ptr(), your patch would take the invalid address, calculate > >>>> an > >>>> invalid "nr", and continue from there, right? > >> > >> Yes, if an invalid "nr" is the number where section does not exist, > >> valid_section_nr() would return 0. Even if it is the number where section > >> exists by accident, the invalid "addr" is not between mem_map and > >> end_mem_map, > >> or not page-aligned, because if so, it is a page structure address. > >> > >> Also without this patch, when an invalid address comes, the loop could > >> tries > >> many invalid "nr"s less than NR_MEM_SECTIONS(). > >> > >> I hope this answers your concern.. > >> > >>> > >>> Another suggestion/question -- if is_page_ptr() is called with a NULL > >>> phys > >>> argument (as is done most of the time), could it skip the "if > >>> IS_SPARSEMEM()" > >>> section at the top, and still utilize the part at the bottom, where it > >>> walks > >>> through the vt->node_table[x] array? I'm not sure about the "ppend" > >>> calculation > >>> though -- even if there are holes in the node's address space, is it > >>> still > >>> a > >>> contiguous chunk of page structure addresses per-node? > >> > >> I'm still investigating and not sure yet, but I think that SPASEMEM uses > >> mem_section instead of node_mem_map means page structures could be > >> non-contignuous per-node according to architecture or condition. > >> > >> typedef struct pglist_data { > >> ... > >> #ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */ > >> struct page *node_mem_map; > >> > >> I'll continue to check it. > > > > You are right, but in the case where pglist_data.node_mem_map does *not* > > exist, > > the crash utility initializes each vt->node_table[node].mem_map with the > > node's > > starting mem_map address by using the return value from phys_to_page() of > > the > > node's starting physical address -- which uses the sparsemem functions. > > > > The question is whether the current "ppend" calculation is correct for the > > last > > physical page in a node. If it is not correct, then perhaps an > > "mem_map_end" value > > can be added to the node_table structure, initialized by using > > phys_to_page() to get > > the page address of the last physical address in the node. And then in > > that case, the > > question is whether the mem_map range of virtual addresses are contiguous > > -- even if > > there are holes in the mem_map virtual address range. > > "node_size" is set to pglist_data.node_spanned_pages, which includes holes. > So I think that if VMEMMAP, which a page address is linear against its pfn, > the current "ppend" calculation is correct for the last page in a node. > But if not VMEMMAP, since there is no guarantee of the linearity, the > calculation could be incorrect. > > I found an example with RHEL5: > > crash> help -o > ... > size_table: > page: 56 > ... > crash> kmem -n > NODE SIZE PGLIST_DATA BOOTMEM_DATA NODE_ZONES > 0 524279 ffff810000014000 ffffffff804e1900 ffff810000014000 > ffff810000014b00 > ffff810000015600 > ffff810000016100 > MEM_MAP START_PADDR START_MAPNR > ffff8100007da000 0 0 > > ZONE NAME SIZE MEM_MAP START_PADDR START_MAPNR > 0 DMA 4096 ffff8100007da000 0 0 > 1 DMA32 520183 ffff810000812000 1000000 4096 > 2 Normal 0 0 0 0 > 3 HighMem 0 0 0 0 > > ------------------------------------------------------------------- > > NR SECTION CODED_MEM_MAP MEM_MAP PFN > 0 ffff810009000000 ffff8100007da000 ffff8100007da000 0 > 1 ffff810009000008 ffff8100007da000 ffff81000099a000 32768 > 2 ffff810009000010 ffff8100007da000 ffff810000b5a000 65536 > 3 ffff810009000018 ffff8100007da000 ffff810000d1a000 98304 <= there is a > 4 ffff810009000020 ffff810008901000 ffff810009001000 131072 <= mem_map gap. > 5 ffff810009000028 ffff810008901000 ffff8100091c1000 163840 > : > 14 ffff810009000070 ffff810008901000 ffff81000a181000 458752 > 15 ffff810009000078 ffff810008901000 ffff81000a341000 491520 > crash> > > In this case, the "ppend" will be > > 0xffff8100007da000 + (524279 * 56) > = 0xffff8100023d9e08 > > but it looks like the actual value is around 0xffff81000a501000. Right, I understand that the current "ppend" calculation wouldn't work. > And also, we can see the gap between NR=3 and 4. This means that if the > correct "mem_map_end" is added to the node_table structure, it would be > not enough to check whether an address is a page structure. Why? Wouldn't it still give us an ascending range of page structure addresses on a per-node basis? (even if there was a physical and/or virtual memory hole?) AFAICT, for each section NR, the MEM_MAP and PFN values always increment. Dave > Thanks, > Kazuhito Hagio > > > > > Thanks, > > Dave > > > > > > > >> > >> Thanks, > >> Kazuhito Hagio > >> > >>> > >>>> > >>>> Dave > >>>> > >>>>> > >>>>>> > >>>>>> There is really no compelling reason that count_partial() absolutely > >>>>>> *must* use > >>>>>> is_page_ptr(), and so I'm thinking that perhaps you could come up with > >>>>>> a > >>>>>> less > >>>>>> heavy-handed method for simply testing whether a page.lru entry points > >>>>>> to > >>>>>> another > >>>>>> vmemmap'd page. Something along the lines of adding this for > >>>>>> vmemmap-enabled kernels: > >>>>>> > >>>>>> #define IN_VMEMMAP_RANGE(page) ((page >= VMEMMAP_VADDR) && (page <= > >>>>>> VMEMMAP_END)) > >>>>>> > >>>>>> and then have count_partial() replace the is_page_ptr() call with > >>>>>> another > >>>>>> slub function that does something like this for vmemmap-enabled > >>>>>> kernels: > >>>>>> > >>>>>> (IN_VMMEMAP_RANGE(next) && accessible(next)) > >>>>>> > >>>>>> Or instead of accessible(), it could read "next" as a list_head with > >>>>>> RETURN_ON_ERROR, > >>>>>> and verify that next->prev points back to the current list_head. > >>>>>> > >>>>>> Non-vmemmap-enabled kernels could still use is_page_ptr(). > >>>>>> > >>>>>> What do you think of doing something like that? > >>>>> > >>>>> Given possible compatibility issues you said, I think that the way you > >>>>> suggested > >>>>> might well be enough for now. I'll try a method like the above. > >>>>> > >>>>> Thanks, > >>>>> Kazuhito Hagio > >>>>> > >>>>>> > >>>>>> Dave > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> ----- Original Message ----- > >>>>>>> Hi, > >>>>>>> > >>>>>>> The "kmem -[sS]" commands can take several minutes to complete with > >>>>>>> the following conditions: > >>>>>>> * The system has a lot of memory sections with CONFIG_SPARSEMEM. > >>>>>>> * The kernel uses SLUB and it has a very long partial slab list. > >>>>>>> > >>>>>>> crash> kmem -s dentry | awk '{print strftime("%T"), $0}' > >>>>>>> 10:18:34 CACHE NAME OBJSIZE ALLOCATED > >>>>>>> TOTAL > >>>>>>> SLABS SSIZE > >>>>>>> 10:19:41 ffff88017fc78a00 dentry 192 9038949 > >>>>>>> 10045728 > >>>>>>> 239184 8k > >>>>>>> crash> kmem -S dentry | bash -c 'cat >/dev/null ; echo $SECONDS' > >>>>>>> 334 > >>>>>>> > >>>>>>> One of the causes is that is_page_ptr() in count_partial() checks if > >>>>>>> a given slub page address is a page struct by searching all memory > >>>>>>> sections linearly for the one which includes it. > >>>>>>> > >>>>>>> nr_mem_sections = NR_MEM_SECTIONS(); > >>>>>>> for (nr = 0; nr < nr_mem_sections ; nr++) { > >>>>>>> if ((sec_addr = valid_section_nr(nr))) { > >>>>>>> ... > >>>>>>> > >>>>>>> With CONFIG_SPARSEMEM{_VMEMMAP}, we can calculate the memory section > >>>>>>> which includes a page struct with its page.flags, or its address and > >>>>>>> VMEMMAP_VADDR. With this patch doing so, the computation amount can > >>>>>>> be > >>>>>>> significantly reduced in that case. > >>>>>>> > >>>>>>> crash> kmem -s dentry | awk '{print strftime("%T"), $0}' > >>>>>>> 10:34:55 CACHE NAME OBJSIZE ALLOCATED > >>>>>>> TOTAL > >>>>>>> SLABS SSIZE > >>>>>>> 10:34:55 ffff88017fc78a00 dentry 192 9038949 > >>>>>>> 10045728 > >>>>>>> 239184 8k > >>>>>>> crash> kmem -S dentry | bash -c 'cat >/dev/null ; echo $SECONDS' > >>>>>>> 2 > >>>>>>> > >>>>>>> This patch uses VMEMMAP_VADDR. It is not defined on PPC64, but it > >>>>>>> looks > >>>>>>> like PPC64 supports VMEMMAP flag and machdep->machspec->vmemmap_base > >>>>>>> is > >>>>>>> it, so this patch also defines it for PPC64. This might need some > >>>>>>> help > >>>>>>> from PPC folks. > >>>>>>> > >>>>>>> Signed-off-by: Kazuhito Hagio <k-hagio@xxxxxxxxxxxxx> > >>>>>>> --- > >>>>>>> defs.h | 2 ++ > >>>>>>> memory.c | 15 +++++++++++++++ > >>>>>>> 2 files changed, 17 insertions(+) > >>>>>>> > >>>>>>> diff --git a/defs.h b/defs.h > >>>>>>> index aa17792..84e68ca 100644 > >>>>>>> --- a/defs.h > >>>>>>> +++ b/defs.h > >>>>>>> @@ -3861,6 +3861,8 @@ struct efi_memory_desc_t { > >>>>>>> #define IS_VMALLOC_ADDR(X) machdep->machspec->is_vmaddr(X) > >>>>>>> #define KERNELBASE machdep->pageoffset > >>>>>>> > >>>>>>> +#define VMEMMAP_VADDR (machdep->machspec->vmemmap_base) > >>>>>>> + > >>>>>>> #define PGDIR_SHIFT (machdep->pageshift + (machdep->pageshift > >>>>>>> -3) > >>>>>>> + > >>>>>>> (machdep->pageshift - 2)) > >>>>>>> #define PMD_SHIFT (machdep->pageshift + (machdep->pageshift - > >>>>>>> 3)) > >>>>>>> > >>>>>>> diff --git a/memory.c b/memory.c > >>>>>>> index 0df8ecc..0696763 100644 > >>>>>>> --- a/memory.c > >>>>>>> +++ b/memory.c > >>>>>>> @@ -13348,10 +13348,25 @@ is_page_ptr(ulong addr, physaddr_t *phys) > >>>>>>> ulong nr_mem_sections; > >>>>>>> ulong coded_mem_map, mem_map, end_mem_map; > >>>>>>> physaddr_t section_paddr; > >>>>>>> +#ifdef VMEMMAP_VADDR > >>>>>>> + ulong flags; > >>>>>>> +#endif > >>>>>>> > >>>>>>> if (IS_SPARSEMEM()) { > >>>>>>> nr_mem_sections = NR_MEM_SECTIONS(); > >>>>>>> +#ifdef VMEMMAP_VADDR > >>>>>>> + nr = nr_mem_sections; > >>>>>>> + if (machdep->flags & VMEMMAP) > >>>>>>> + nr = pfn_to_section_nr((addr - VMEMMAP_VADDR) / SIZE(page)); > >>>>>>> + else if (readmem(addr + OFFSET(page_flags), KVADDR, &flags, > >>>>>>> + sizeof(ulong), "page.flags", RETURN_ON_ERROR|QUIET)) > >>>>>>> + nr = (flags >> (SIZE(page_flags)*8 - SECTIONS_SHIFT()) > >>>>>>> + & ((1UL << SECTIONS_SHIFT()) - 1)); > >>>>>>> + > >>>>>>> + if (nr < nr_mem_sections) { > >>>>>>> +#else > >>>>>>> for (nr = 0; nr < nr_mem_sections ; nr++) { > >>>>>>> +#endif > >>>>>>> if ((sec_addr = valid_section_nr(nr))) { > >>>>>>> coded_mem_map = > >>>>>>> section_mem_map_addr(sec_addr); > >>>>>>> mem_map = > >>>>>>> sparse_decode_mem_map(coded_mem_map, > >>>>>>> nr); > >>>>>>> -- > >>>>>>> 1.8.3.1 > >>>>>>> > >>>>>>> -- > >>>>>>> Crash-utility mailing list > >>>>>>> Crash-utility@xxxxxxxxxx > >>>>>>> https://www.redhat.com/mailman/listinfo/crash-utility > >>>>>>> > >>>>>> > >>>>>> -- > >>>>>> Crash-utility mailing list > >>>>>> Crash-utility@xxxxxxxxxx > >>>>>> https://www.redhat.com/mailman/listinfo/crash-utility > >>>>>> > >>>>> > >>>>> -- > >>>>> Crash-utility mailing list > >>>>> Crash-utility@xxxxxxxxxx > >>>>> https://www.redhat.com/mailman/listinfo/crash-utility > >>>>> > >>>> > >>>> -- > >>>> Crash-utility mailing list > >>>> Crash-utility@xxxxxxxxxx > >>>> https://www.redhat.com/mailman/listinfo/crash-utility > >>>> > >>> > >>> -- > >>> Crash-utility mailing list > >>> Crash-utility@xxxxxxxxxx > >>> https://www.redhat.com/mailman/listinfo/crash-utility > >>> > >> > >> -- > >> Crash-utility mailing list > >> Crash-utility@xxxxxxxxxx > >> https://www.redhat.com/mailman/listinfo/crash-utility > >> > > > > -- > > Crash-utility mailing list > > Crash-utility@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/crash-utility > > > > -- > Crash-utility mailing list > Crash-utility@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/crash-utility > -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility