----- Original Message ----- > Hi Dave, > > On 2/15/2018 12:38 PM, Dave Anderson wrote: > ... > >>>> Because of your questions about ppc64, possible backwards-compatibility > >>>> issues, > >>>> or potential future changes to page.flags usage, this permanent change > >>>> to the > >>>> is_page_ptr() function solely for the purposes of SLUB's count_partial() > >>>> function > >>>> makes me nervous. > >>> > >>> Sorry, my explanation was not enough. Please let me supplement a little. > >>> As far as I know, the count_partial() is the function which could receive > >>> the > >>> effect of this patch most. But is_page_ptr() could be called many times > >>> also > >>> through the other functions, so this will improve them, too. Moreover, > >>> the > >>> is_page_ptr()'s loop could loop NR_MEM_SECTIONS() times, which is > >>> 33554432 > >>> on x86_64(5level) if I understand correctly, in the worst case. > >>> > >>> So I thought that ideally we should improve the is_page_ptr() itself if > >>> possible, > >>> rather than find the callers which could call it many times and change > >>> them. > >>> Also, I am willing to drop the definition of VMEMMAP_VADDR for ppc64 this > >>> time. > >> > >> OK, I understand your point. But what concerns me is that the function's > >> purpose is to absolutely identify whether the incoming page structure > >> address > >> is a correct page structure address. But if an invalid address gets > >> passed > >> into is_page_ptr(), your patch would take the invalid address, calculate > >> an > >> invalid "nr", and continue from there, right? > > Yes, if an invalid "nr" is the number where section does not exist, > valid_section_nr() would return 0. Even if it is the number where section > exists by accident, the invalid "addr" is not between mem_map and > end_mem_map, > or not page-aligned, because if so, it is a page structure address. > > Also without this patch, when an invalid address comes, the loop could tries > many invalid "nr"s less than NR_MEM_SECTIONS(). > > I hope this answers your concern.. > > > > > Another suggestion/question -- if is_page_ptr() is called with a NULL phys > > argument (as is done most of the time), could it skip the "if > > IS_SPARSEMEM()" > > section at the top, and still utilize the part at the bottom, where it > > walks > > through the vt->node_table[x] array? I'm not sure about the "ppend" > > calculation > > though -- even if there are holes in the node's address space, is it still > > a > > contiguous chunk of page structure addresses per-node? > > I'm still investigating and not sure yet, but I think that SPASEMEM uses > mem_section instead of node_mem_map means page structures could be > non-contignuous per-node according to architecture or condition. > > typedef struct pglist_data { > ... > #ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */ > struct page *node_mem_map; > > I'll continue to check it. You are right, but in the case where pglist_data.node_mem_map does *not* exist, the crash utility initializes each vt->node_table[node].mem_map with the node's starting mem_map address by using the return value from phys_to_page() of the node's starting physical address -- which uses the sparsemem functions. The question is whether the current "ppend" calculation is correct for the last physical page in a node. If it is not correct, then perhaps an "mem_map_end" value can be added to the node_table structure, initialized by using phys_to_page() to get the page address of the last physical address in the node. And then in that case, the question is whether the mem_map range of virtual addresses are contiguous -- even if there are holes in the mem_map virtual address range. Thanks, Dave > > Thanks, > Kazuhito Hagio > > > > >> > >> Dave > >> > >>> > >>>> > >>>> There is really no compelling reason that count_partial() absolutely > >>>> *must* use > >>>> is_page_ptr(), and so I'm thinking that perhaps you could come up with a > >>>> less > >>>> heavy-handed method for simply testing whether a page.lru entry points > >>>> to > >>>> another > >>>> vmemmap'd page. Something along the lines of adding this for > >>>> vmemmap-enabled kernels: > >>>> > >>>> #define IN_VMEMMAP_RANGE(page) ((page >= VMEMMAP_VADDR) && (page <= > >>>> VMEMMAP_END)) > >>>> > >>>> and then have count_partial() replace the is_page_ptr() call with > >>>> another > >>>> slub function that does something like this for vmemmap-enabled kernels: > >>>> > >>>> (IN_VMMEMAP_RANGE(next) && accessible(next)) > >>>> > >>>> Or instead of accessible(), it could read "next" as a list_head with > >>>> RETURN_ON_ERROR, > >>>> and verify that next->prev points back to the current list_head. > >>>> > >>>> Non-vmemmap-enabled kernels could still use is_page_ptr(). > >>>> > >>>> What do you think of doing something like that? > >>> > >>> Given possible compatibility issues you said, I think that the way you > >>> suggested > >>> might well be enough for now. I'll try a method like the above. > >>> > >>> Thanks, > >>> Kazuhito Hagio > >>> > >>>> > >>>> Dave > >>>> > >>>> > >>>> > >>>> > >>>> ----- Original Message ----- > >>>>> Hi, > >>>>> > >>>>> The "kmem -[sS]" commands can take several minutes to complete with > >>>>> the following conditions: > >>>>> * The system has a lot of memory sections with CONFIG_SPARSEMEM. > >>>>> * The kernel uses SLUB and it has a very long partial slab list. > >>>>> > >>>>> crash> kmem -s dentry | awk '{print strftime("%T"), $0}' > >>>>> 10:18:34 CACHE NAME OBJSIZE ALLOCATED > >>>>> TOTAL > >>>>> SLABS SSIZE > >>>>> 10:19:41 ffff88017fc78a00 dentry 192 9038949 > >>>>> 10045728 > >>>>> 239184 8k > >>>>> crash> kmem -S dentry | bash -c 'cat >/dev/null ; echo $SECONDS' > >>>>> 334 > >>>>> > >>>>> One of the causes is that is_page_ptr() in count_partial() checks if > >>>>> a given slub page address is a page struct by searching all memory > >>>>> sections linearly for the one which includes it. > >>>>> > >>>>> nr_mem_sections = NR_MEM_SECTIONS(); > >>>>> for (nr = 0; nr < nr_mem_sections ; nr++) { > >>>>> if ((sec_addr = valid_section_nr(nr))) { > >>>>> ... > >>>>> > >>>>> With CONFIG_SPARSEMEM{_VMEMMAP}, we can calculate the memory section > >>>>> which includes a page struct with its page.flags, or its address and > >>>>> VMEMMAP_VADDR. With this patch doing so, the computation amount can be > >>>>> significantly reduced in that case. > >>>>> > >>>>> crash> kmem -s dentry | awk '{print strftime("%T"), $0}' > >>>>> 10:34:55 CACHE NAME OBJSIZE ALLOCATED > >>>>> TOTAL > >>>>> SLABS SSIZE > >>>>> 10:34:55 ffff88017fc78a00 dentry 192 9038949 > >>>>> 10045728 > >>>>> 239184 8k > >>>>> crash> kmem -S dentry | bash -c 'cat >/dev/null ; echo $SECONDS' > >>>>> 2 > >>>>> > >>>>> This patch uses VMEMMAP_VADDR. It is not defined on PPC64, but it looks > >>>>> like PPC64 supports VMEMMAP flag and machdep->machspec->vmemmap_base is > >>>>> it, so this patch also defines it for PPC64. This might need some help > >>>>> from PPC folks. > >>>>> > >>>>> Signed-off-by: Kazuhito Hagio <k-hagio@xxxxxxxxxxxxx> > >>>>> --- > >>>>> defs.h | 2 ++ > >>>>> memory.c | 15 +++++++++++++++ > >>>>> 2 files changed, 17 insertions(+) > >>>>> > >>>>> diff --git a/defs.h b/defs.h > >>>>> index aa17792..84e68ca 100644 > >>>>> --- a/defs.h > >>>>> +++ b/defs.h > >>>>> @@ -3861,6 +3861,8 @@ struct efi_memory_desc_t { > >>>>> #define IS_VMALLOC_ADDR(X) machdep->machspec->is_vmaddr(X) > >>>>> #define KERNELBASE machdep->pageoffset > >>>>> > >>>>> +#define VMEMMAP_VADDR (machdep->machspec->vmemmap_base) > >>>>> + > >>>>> #define PGDIR_SHIFT (machdep->pageshift + (machdep->pageshift -3) > >>>>> + > >>>>> (machdep->pageshift - 2)) > >>>>> #define PMD_SHIFT (machdep->pageshift + (machdep->pageshift - > >>>>> 3)) > >>>>> > >>>>> diff --git a/memory.c b/memory.c > >>>>> index 0df8ecc..0696763 100644 > >>>>> --- a/memory.c > >>>>> +++ b/memory.c > >>>>> @@ -13348,10 +13348,25 @@ is_page_ptr(ulong addr, physaddr_t *phys) > >>>>> ulong nr_mem_sections; > >>>>> ulong coded_mem_map, mem_map, end_mem_map; > >>>>> physaddr_t section_paddr; > >>>>> +#ifdef VMEMMAP_VADDR > >>>>> + ulong flags; > >>>>> +#endif > >>>>> > >>>>> if (IS_SPARSEMEM()) { > >>>>> nr_mem_sections = NR_MEM_SECTIONS(); > >>>>> +#ifdef VMEMMAP_VADDR > >>>>> + nr = nr_mem_sections; > >>>>> + if (machdep->flags & VMEMMAP) > >>>>> + nr = pfn_to_section_nr((addr - VMEMMAP_VADDR) / SIZE(page)); > >>>>> + else if (readmem(addr + OFFSET(page_flags), KVADDR, &flags, > >>>>> + sizeof(ulong), "page.flags", RETURN_ON_ERROR|QUIET)) > >>>>> + nr = (flags >> (SIZE(page_flags)*8 - SECTIONS_SHIFT()) > >>>>> + & ((1UL << SECTIONS_SHIFT()) - 1)); > >>>>> + > >>>>> + if (nr < nr_mem_sections) { > >>>>> +#else > >>>>> for (nr = 0; nr < nr_mem_sections ; nr++) { > >>>>> +#endif > >>>>> if ((sec_addr = valid_section_nr(nr))) { > >>>>> coded_mem_map = > >>>>> section_mem_map_addr(sec_addr); > >>>>> mem_map = > >>>>> sparse_decode_mem_map(coded_mem_map, > >>>>> nr); > >>>>> -- > >>>>> 1.8.3.1 > >>>>> > >>>>> -- > >>>>> Crash-utility mailing list > >>>>> Crash-utility@xxxxxxxxxx > >>>>> https://www.redhat.com/mailman/listinfo/crash-utility > >>>>> > >>>> > >>>> -- > >>>> Crash-utility mailing list > >>>> Crash-utility@xxxxxxxxxx > >>>> https://www.redhat.com/mailman/listinfo/crash-utility > >>>> > >>> > >>> -- > >>> Crash-utility mailing list > >>> Crash-utility@xxxxxxxxxx > >>> https://www.redhat.com/mailman/listinfo/crash-utility > >>> > >> > >> -- > >> Crash-utility mailing list > >> Crash-utility@xxxxxxxxxx > >> https://www.redhat.com/mailman/listinfo/crash-utility > >> > > > > -- > > Crash-utility mailing list > > Crash-utility@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/crash-utility > > > > -- > Crash-utility mailing list > Crash-utility@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/crash-utility > -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility