Re: [PATCH RFC] mm: skip gigantic pages in isolate_single_pageblock() when mem offline

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2024/8/13 22:59, Zi Yan wrote:
On 13 Aug 2024, at 10:46, Kefeng Wang wrote:

On 2024/8/13 22:03, Matthew Wilcox wrote:
On Tue, Aug 13, 2024 at 08:52:26PM +0800, Kefeng Wang wrote:
The gigantic page size may larger than memory block size, so memory
offline always fails in this case after commit b2c9e2fbba32 ("mm: make
alloc_contig_range work at pageblock granularity"),

offline_pages
    start_isolate_page_range
      start_isolate_page_range(isolate_before=true)
        isolate [isolate_start, isolate_start + pageblock_nr_pages)
      start_isolate_page_range(isolate_before=false)
        isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock
         	__alloc_contig_migrate_range
            isolate_migratepages_range
              isolate_migratepages_block
                isolate_or_dissolve_huge_page
                  if (hstate_is_gigantic(h))
                      return -ENOMEM;

[   15.815756] memory offlining [mem 0x3c0000000-0x3c7ffffff] failed due to failure to isolate range

Fix it by skipping the __alloc_contig_migrate_range() if met gigantic
pages when memory offline, which return back to the original logic to
handle the gigantic pages.

This seems like the wrong way to fix this.  The logic in the next
PageHuge() section seems like it's specifically supposed to handle
gigantic pages.  So you've just made that dead code, but instead of
removing it, you've left it there to confuse everyone?

isolate_single_pageblock() in start_isolate_page_range() will be called
from memory offline and contig allocation (alloc_contig_pages()), this
changes only restore the behavior from memory offline code, but we still
fail in contig allocation.

 From memory offline, we has own path to isolate/migrate page or dissolve
free hugetlb folios, so I think we don't depends on the __alloc_contig_migrate_range().

I admit to not understanding this code terribly well.

A quick search from [1], the isolate_single_pageblock() is added for
contig allocation, but it has negative effects on memory hotplug,
Zi Yan, could you give some comments?

[1] https://lore.kernel.org/linux-mm/20220425143118.2850746-1-zi.yan@xxxxxxxx/

Probably we can isolate the hugetlb page and use migrate_page() instead of
__alloc_contig_migrate_range() in the section below, since we are targeting
only hugetlb pages here. It should solve the issue.

For contig allocation, I think we must isolate/migrate page in
__alloc_contig_migrate_range(), but for memory offline,(especially for
gigantic hugepage)as mentioned above, we already have own path to
isolate/migrate used page and dissolve the free pages,the
start_isolate_page_range() only need to mark page range MIGRATE_ISOLATE,
that is what we did before b2c9e2fbba32,

start_isolate_page_range
scan_movable_pages
  do_migrate_range
dissolve_free_hugetlb_folios

Do we really need isolate/migrate the hugetlb page and for memory
offline path?



When I sent the original patchset, I over-thought about the situation and
included PageLRU and __PageMovable, so used __alloc_contig_migrate_range().
That was probably not the right approach.

I am aware of that the current page isolation code is very complicated and
planning to clean it up. My current plan is:
1. turn MIGRATE_ISOLATE a standalone bit instead of a migratetype (I have
a local patch)
2. refactor page isolation code, since after 1, migratetype is preserved
across isolations
3. clean up alloc_contig_range().


OK, this is another issue not related about memory hotplug, but we
should fix the current memory offline issue.

Thanks.

Best Regards,
Yan, Zi




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux