On 13 Aug 2024, at 22:01, Kefeng Wang wrote: > On 2024/8/13 22:59, Zi Yan wrote: >> On 13 Aug 2024, at 10:46, Kefeng Wang wrote: >> >>> On 2024/8/13 22:03, Matthew Wilcox wrote: >>>> On Tue, Aug 13, 2024 at 08:52:26PM +0800, Kefeng Wang wrote: >>>>> The gigantic page size may larger than memory block size, so memory >>>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make >>>>> alloc_contig_range work at pageblock granularity"), >>>>> >>>>> offline_pages >>>>> start_isolate_page_range >>>>> start_isolate_page_range(isolate_before=true) >>>>> isolate [isolate_start, isolate_start + pageblock_nr_pages) >>>>> start_isolate_page_range(isolate_before=false) >>>>> isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock >>>>> __alloc_contig_migrate_range >>>>> isolate_migratepages_range >>>>> isolate_migratepages_block >>>>> isolate_or_dissolve_huge_page >>>>> if (hstate_is_gigantic(h)) >>>>> return -ENOMEM; >>>>> >>>>> [ 15.815756] memory offlining [mem 0x3c0000000-0x3c7ffffff] failed due to failure to isolate range >>>>> >>>>> Fix it by skipping the __alloc_contig_migrate_range() if met gigantic >>>>> pages when memory offline, which return back to the original logic to >>>>> handle the gigantic pages. >>>> >>>> This seems like the wrong way to fix this. The logic in the next >>>> PageHuge() section seems like it's specifically supposed to handle >>>> gigantic pages. So you've just made that dead code, but instead of >>>> removing it, you've left it there to confuse everyone? >>> >>> isolate_single_pageblock() in start_isolate_page_range() will be called >>> from memory offline and contig allocation (alloc_contig_pages()), this >>> changes only restore the behavior from memory offline code, but we still >>> fail in contig allocation. >>> >>> From memory offline, we has own path to isolate/migrate page or dissolve >>> free hugetlb folios, so I think we don't depends on the __alloc_contig_migrate_range(). >>>> >>>> I admit to not understanding this code terribly well. >>>> >>> A quick search from [1], the isolate_single_pageblock() is added for >>> contig allocation, but it has negative effects on memory hotplug, >>> Zi Yan, could you give some comments? >>> >>> [1] https://lore.kernel.org/linux-mm/20220425143118.2850746-1-zi.yan@xxxxxxxx/ >> >> Probably we can isolate the hugetlb page and use migrate_page() instead of >> __alloc_contig_migrate_range() in the section below, since we are targeting >> only hugetlb pages here. It should solve the issue. > > For contig allocation, I think we must isolate/migrate page in > __alloc_contig_migrate_range(), but for memory offline,(especially for > gigantic hugepage)as mentioned above, we already have own path to > isolate/migrate used page and dissolve the free pages,the > start_isolate_page_range() only need to mark page range MIGRATE_ISOLATE, > that is what we did before b2c9e2fbba32, > > start_isolate_page_range > scan_movable_pages > do_migrate_range > dissolve_free_hugetlb_folios > > Do we really need isolate/migrate the hugetlb page and for memory > offline path? For memory offline path, there is do_migrate_range() to move the pages. For contig allocation, there is __alloc_contig_migrate_range() after isolation to migrate the pages. The migration code in isolate_single_pageblock() is not needed. Something like this would be OK, just skip the page and let either do_migrate_range() or __alloc_contig_migrate_range() to handle it: diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 042937d5abe4..587d723711c5 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -402,23 +402,6 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, #if defined CONFIG_COMPACTION || defined CONFIG_CMA if (PageHuge(page)) { - int page_mt = get_pageblock_migratetype(page); - struct compact_control cc = { - .nr_migratepages = 0, - .order = -1, - .zone = page_zone(pfn_to_page(head_pfn)), - .mode = MIGRATE_SYNC, - .ignore_skip_hint = true, - .no_set_skip_hint = true, - .gfp_mask = gfp_flags, - .alloc_contig = true, - }; - INIT_LIST_HEAD(&cc.migratepages); - - ret = __alloc_contig_migrate_range(&cc, head_pfn, - head_pfn + nr_pages, page_mt); - if (ret) - goto failed; pfn = head_pfn + nr_pages; continue; } > > >> >> When I sent the original patchset, I over-thought about the situation and >> included PageLRU and __PageMovable, so used __alloc_contig_migrate_range(). >> That was probably not the right approach. >> >> I am aware of that the current page isolation code is very complicated and >> planning to clean it up. My current plan is: >> 1. turn MIGRATE_ISOLATE a standalone bit instead of a migratetype (I have >> a local patch) >> 2. refactor page isolation code, since after 1, migratetype is preserved >> across isolations >> 3. clean up alloc_contig_range(). >> > > OK, this is another issue not related about memory hotplug, but we > should fix the current memory offline issue. > > Thanks. >> >> Best Regards, >> Yan, Zi -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature