On 14 Aug 2024, at 22:58, Kefeng Wang wrote: > On 2024/8/14 22:53, Zi Yan wrote: >> On 13 Aug 2024, at 22:01, Kefeng Wang wrote: >> >>> On 2024/8/13 22:59, Zi Yan wrote: >>>> On 13 Aug 2024, at 10:46, Kefeng Wang wrote: >>>> >>>>> On 2024/8/13 22:03, Matthew Wilcox wrote: >>>>>> On Tue, Aug 13, 2024 at 08:52:26PM +0800, Kefeng Wang wrote: >>>>>>> The gigantic page size may larger than memory block size, so memory >>>>>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make >>>>>>> alloc_contig_range work at pageblock granularity"), >>>>>>> >>>>>>> offline_pages >>>>>>> start_isolate_page_range >>>>>>> start_isolate_page_range(isolate_before=true) >>>>>>> isolate [isolate_start, isolate_start + pageblock_nr_pages) >>>>>>> start_isolate_page_range(isolate_before=false) >>>>>>> isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock >>>>>>> __alloc_contig_migrate_range >>>>>>> isolate_migratepages_range >>>>>>> isolate_migratepages_block >>>>>>> isolate_or_dissolve_huge_page >>>>>>> if (hstate_is_gigantic(h)) >>>>>>> return -ENOMEM; >>>>>>> >>>>>>> [ 15.815756] memory offlining [mem 0x3c0000000-0x3c7ffffff] failed due to failure to isolate range >>>>>>> >>>>>>> Fix it by skipping the __alloc_contig_migrate_range() if met gigantic >>>>>>> pages when memory offline, which return back to the original logic to >>>>>>> handle the gigantic pages. >>>>>> >>>>>> This seems like the wrong way to fix this. The logic in the next >>>>>> PageHuge() section seems like it's specifically supposed to handle >>>>>> gigantic pages. So you've just made that dead code, but instead of >>>>>> removing it, you've left it there to confuse everyone? >>>>> >>>>> isolate_single_pageblock() in start_isolate_page_range() will be called >>>>> from memory offline and contig allocation (alloc_contig_pages()), this >>>>> changes only restore the behavior from memory offline code, but we still >>>>> fail in contig allocation. >>>>> >>>>> From memory offline, we has own path to isolate/migrate page or dissolve >>>>> free hugetlb folios, so I think we don't depends on the __alloc_contig_migrate_range(). >>>>>> >>>>>> I admit to not understanding this code terribly well. >>>>>> >>>>> A quick search from [1], the isolate_single_pageblock() is added for >>>>> contig allocation, but it has negative effects on memory hotplug, >>>>> Zi Yan, could you give some comments? >>>>> >>>>> [1] https://lore.kernel.org/linux-mm/20220425143118.2850746-1-zi.yan@xxxxxxxx/ >>>> >>>> Probably we can isolate the hugetlb page and use migrate_page() instead of >>>> __alloc_contig_migrate_range() in the section below, since we are targeting >>>> only hugetlb pages here. It should solve the issue. >>> >>> For contig allocation, I think we must isolate/migrate page in >>> __alloc_contig_migrate_range(), but for memory offline,(especially for >>> gigantic hugepage)as mentioned above, we already have own path to >>> isolate/migrate used page and dissolve the free pages,the >>> start_isolate_page_range() only need to mark page range MIGRATE_ISOLATE, >>> that is what we did before b2c9e2fbba32, >>> >>> start_isolate_page_range >>> scan_movable_pages >>> do_migrate_range >>> dissolve_free_hugetlb_folios >>> >>> Do we really need isolate/migrate the hugetlb page and for memory >>> offline path? >> >> For memory offline path, there is do_migrate_range() to move the pages. >> For contig allocation, there is __alloc_contig_migrate_range() after >> isolation to migrate the pages. >> >> The migration code in isolate_single_pageblock() is not needed. >> Something like this would be OK, just skip the page and let either >> do_migrate_range() or __alloc_contig_migrate_range() to handle it: > > Oh, right, for alloc_contig_range(), we do have another __alloc_contig_migrate_range() after start_isolate_page_range(), then we > could drop the following code, > >> >> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >> index 042937d5abe4..587d723711c5 100644 >> --- a/mm/page_isolation.c >> +++ b/mm/page_isolation.c >> @@ -402,23 +402,6 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, >> >> #if defined CONFIG_COMPACTION || defined CONFIG_CMA >> if (PageHuge(page)) { >> - int page_mt = get_pageblock_migratetype(page); >> - struct compact_control cc = { >> - .nr_migratepages = 0, >> - .order = -1, >> - .zone = page_zone(pfn_to_page(head_pfn)), >> - .mode = MIGRATE_SYNC, >> - .ignore_skip_hint = true, >> - .no_set_skip_hint = true, >> - .gfp_mask = gfp_flags, >> - .alloc_contig = true, >> - }; >> - INIT_LIST_HEAD(&cc.migratepages); >> - >> - ret = __alloc_contig_migrate_range(&cc, head_pfn, >> - head_pfn + nr_pages, page_mt); >> - if (ret) >> - goto failed; >> pfn = head_pfn + nr_pages; >> continue; >> } > > > But we need to remove the CONFIG_COMPACTION/CMA too, thought? > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 042937d5abe4..785c2d320631 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -395,30 +395,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, > unsigned long head_pfn = page_to_pfn(head); > unsigned long nr_pages = compound_nr(head); > > - if (head_pfn + nr_pages <= boundary_pfn) { > - pfn = head_pfn + nr_pages; > - continue; > - } > - > -#if defined CONFIG_COMPACTION || defined CONFIG_CMA > - if (PageHuge(page)) { > - int page_mt = get_pageblock_migratetype(page); > - struct compact_control cc = { > - .nr_migratepages = 0, > - .order = -1, > - .zone = page_zone(pfn_to_page(head_pfn)), > - .mode = MIGRATE_SYNC, > - .ignore_skip_hint = true, > - .no_set_skip_hint = true, > - .gfp_mask = gfp_flags, > - .alloc_contig = true, > - }; > - INIT_LIST_HEAD(&cc.migratepages); > - > - ret = __alloc_contig_migrate_range(&cc, head_pfn, > - head_pfn + nr_pages, page_mt); > - if (ret) > - goto failed; > + if (head_pfn + nr_pages <= boundary_pfn || > + PageHuge(page)) > pfn = head_pfn + nr_pages; > continue; > } > @@ -432,7 +410,6 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, > */ > VM_WARN_ON_ONCE_PAGE(PageLRU(page), page); > VM_WARN_ON_ONCE_PAGE(__PageMovable(page), page); > -#endif > goto failed; > } That looks good to me. Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature