(2013/11/28 16:08), Atsushi Kumagai wrote: > On 2013/11/22 16:18:20, kexec <kexec-bounces at lists.infradead.org> wrote: >> (2013/11/07 9:54), HATAYAMA Daisuke wrote: >>> (2013/11/06 11:21), Atsushi Kumagai wrote: >>>> (2013/11/06 5:27), Vivek Goyal wrote: >>>>> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote: >>>>>> This patch set intend to exclude unnecessary hugepages from vmcore dump file. >>>>>> >>>>>> This patch requires the kernel patch to export necessary data structures into >>>>>> vmcore: "kexec: export hugepage data structure into vmcoreinfo" >>>>>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html >>>>>> >>>>>> This patch introduce two new dump levels 32 and 64 to exclude all unused and >>>>>> active hugepages. The level to exclude all unnecessary pages will be 127 now. >>>>> >>>>> Interesting. Why hugepages should be treated any differentely than normal >>>>> pages? >>>>> >>>>> If user asked to filter out free page, then it should be filtered and >>>>> it should not matter whether it is a huge page or not? >>>> >>>> I'm making a RFC patch of hugepages filtering based on such policy. >>>> >>>> I attach the prototype version. >>>> It's able to filter out also THPs, and suitable for cyclic processing >>>> because it depends on mem_map and looking up it can be divided into >>>> cycles. This is the same idea as page_is_buddy(). >>>> >>>> So I think it's better. >>>> >>> >>>> @@ -4506,14 +4583,49 @@ __exclude_unnecessary_pages(unsigned long mem_map, >>>> && !isAnon(mapping)) { >>>> if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) >>>> pfn_cache_private++; >>>> + /* >>>> + * NOTE: If THP for cache is introduced, the check for >>>> + * compound pages is needed here. >>>> + */ >>>> } >>>> /* >>>> * Exclude the data page of the user process. >>>> */ >>>> - else if ((info->dump_level & DL_EXCLUDE_USER_DATA) >>>> - && isAnon(mapping)) { >>>> - if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) >>>> - pfn_user++; >>>> + else if (info->dump_level & DL_EXCLUDE_USER_DATA) { >>>> + /* >>>> + * Exclude the anonnymous pages as user pages. >>>> + */ >>>> + if (isAnon(mapping)) { >>>> + if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) >>>> + pfn_user++; >>>> + >>>> + /* >>>> + * Check the compound page >>>> + */ >>>> + if (page_is_hugepage(flags) && compound_order > 0) { >>>> + int i, nr_pages = 1 << compound_order; >>>> + >>>> + for (i = 1; i < nr_pages; ++i) { >>>> + if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) >>>> + pfn_user++; >>>> + } >>>> + pfn += nr_pages - 2; >>>> + mem_map += (nr_pages - 1) * SIZE(page); >>>> + } >>>> + } >>>> + /* >>>> + * Exclude the hugetlbfs pages as user pages. >>>> + */ >>>> + else if (hugetlb_dtor == SYMBOL(free_huge_page)) { >>>> + int i, nr_pages = 1 << compound_order; >>>> + >>>> + for (i = 0; i < nr_pages; ++i) { >>>> + if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) >>>> + pfn_user++; >>>> + } >>>> + pfn += nr_pages - 1; >>>> + mem_map += (nr_pages - 1) * SIZE(page); >>>> + } >>>> } >>>> /* >>>> * Exclude the hwpoison page. >>> >>> I'm concerned about the case that filtering is not performed to part of mem_map >>> entries not belonging to the current cyclic range. >>> >>> If maximum value of compound_order is larger than maximum value of >>> CONFIG_FORCE_MAX_ZONEORDER, which makedumpfile obtains by ARRAY_LENGTH(zone.free_area), >>> it's necessary to align info->bufsize_cyclic with larger one in >>> check_cyclic_buffer_overrun(). >>> >> >> ping, in case you overlooked this... > > Sorry for the delayed response, I prioritize the release of v1.5.5 now. > > Thanks for your advice, check_cyclic_buffer_overrun() should be fixed > as you said. In addition, I'm considering other way to address such case, > that is to bring the number of "overflowed pages" to the next cycle and > exclude them at the top of __exclude_unnecessary_pages() like below: > > /* > * The pages which should be excluded still remain. > */ > if (remainder >= 1) { > int i; > unsigned long tmp; > for (i = 0; i < remainder; ++i) { > if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) { > pfn_user++; > tmp++; > } > } > pfn += tmp; > remainder -= tmp; > mem_map += (tmp - 1) * SIZE(page); > continue; > } > > If this way works well, then aligning info->buf_size_cyclic will be > unnecessary. > I selected the current implementation of changing cyclic buffer size becuase I thought it was simpler than carrying over remaining filtered pages to next cycle in that there was no need to add extra code in filtering processing. I guess the reason why you think this is better now is how to detect maximum order of huge page is hard in some way, right? -- Thanks. HATAYAMA, Daisuke