Re: [PATCH RFC] mm: skip gigantic pages in isolate_single_pageblock() when mem offline

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13 Aug 2024, at 10:46, Kefeng Wang wrote:

> On 2024/8/13 22:03, Matthew Wilcox wrote:
>> On Tue, Aug 13, 2024 at 08:52:26PM +0800, Kefeng Wang wrote:
>>> The gigantic page size may larger than memory block size, so memory
>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make
>>> alloc_contig_range work at pageblock granularity"),
>>>
>>> offline_pages
>>>    start_isolate_page_range
>>>      start_isolate_page_range(isolate_before=true)
>>>        isolate [isolate_start, isolate_start + pageblock_nr_pages)
>>>      start_isolate_page_range(isolate_before=false)
>>>        isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock
>>>         	__alloc_contig_migrate_range
>>>            isolate_migratepages_range
>>>              isolate_migratepages_block
>>>                isolate_or_dissolve_huge_page
>>>                  if (hstate_is_gigantic(h))
>>>                      return -ENOMEM;
>>>
>>> [   15.815756] memory offlining [mem 0x3c0000000-0x3c7ffffff] failed due to failure to isolate range
>>>
>>> Fix it by skipping the __alloc_contig_migrate_range() if met gigantic
>>> pages when memory offline, which return back to the original logic to
>>> handle the gigantic pages.
>>
>> This seems like the wrong way to fix this.  The logic in the next
>> PageHuge() section seems like it's specifically supposed to handle
>> gigantic pages.  So you've just made that dead code, but instead of
>> removing it, you've left it there to confuse everyone?
>
> isolate_single_pageblock() in start_isolate_page_range() will be called
> from memory offline and contig allocation (alloc_contig_pages()), this
> changes only restore the behavior from memory offline code, but we still
> fail in contig allocation.
>
> From memory offline, we has own path to isolate/migrate page or dissolve
> free hugetlb folios, so I think we don't depends on the __alloc_contig_migrate_range().
>>
>> I admit to not understanding this code terribly well.
>>
> A quick search from [1], the isolate_single_pageblock() is added for
> contig allocation, but it has negative effects on memory hotplug,
> Zi Yan, could you give some comments?
>
> [1] https://lore.kernel.org/linux-mm/20220425143118.2850746-1-zi.yan@xxxxxxxx/

Probably we can isolate the hugetlb page and use migrate_page() instead of
__alloc_contig_migrate_range() in the section below, since we are targeting
only hugetlb pages here. It should solve the issue.

When I sent the original patchset, I over-thought about the situation and
included PageLRU and __PageMovable, so used __alloc_contig_migrate_range().
That was probably not the right approach.

I am aware of that the current page isolation code is very complicated and
planning to clean it up. My current plan is:
1. turn MIGRATE_ISOLATE a standalone bit instead of a migratetype (I have
a local patch)
2. refactor page isolation code, since after 1, migratetype is preserved
across isolations
3. clean up alloc_contig_range().


Best Regards,
Yan, Zi

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux