On 2024/8/17 16:49, Kefeng Wang wrote: > The commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned > pages to be offlined") don't handle the hugetlb pages, the endless > loop still occur if offline a hwpoison hugetlb, luckly, with the > commit e591ef7d96d6 ("mm,hwpoison,hugetlb,memory_hotplug: hotremove > memory section with hwpoisoned hugepage") section with hwpoisoned > hugepage"), the HPageMigratable of hugetlb page will be clear, and It should be commit e591ef7d96d6 ("mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage")? Above "section with hwpoisoned")" is duplicated. Also s/be clear/be cleared/ ? > the hwpoison hugetlb page will be skipped in scan_movable_pages(), > so the endless loop issue is fixed. > > However if the HPageMigratable() check passed(without reference and > lock), the hugetlb page may be hwpoisoned, it won't cause issue since > the hwpoisoned page will be handled correctly in the next movable > pages scan loop, and it will be isolated in do_migrate_range() but > fails to migrate. In order to avoid the unnecessary isolation and > unify all hwpoisoned page handling, let's unconditionally check hwpoison > firstly, and if it is a hwpoisoned hugetlb page, try to unmap it as > the catch all safety net like normal page does. > > Signed-off-by: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> > --- > mm/memory_hotplug.c | 17 +++++++++-------- > 1 file changed, 9 insertions(+), 8 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index dc19b0e28fbc..02a0d4fbc3fe 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1793,13 +1793,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > * but out loop could handle that as it revisits the split > * folio later. > */ > - if (folio_test_large(folio)) { > + if (folio_test_large(folio)) > pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; > - if (folio_test_hugetlb(folio)) { > - isolate_hugetlb(folio, &source); > - continue; > - } > - } > > /* > * HWPoison pages have elevated reference counts so the migration would > @@ -1808,11 +1803,17 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > * (e.g. current hwpoison implementation doesn't unmap KSM pages but keep > * the unmap as the catch all safety net). > */ > - if (PageHWPoison(page)) { > + if (folio_test_hwpoison(folio) || > + (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { > if (WARN_ON(folio_test_lru(folio))) > folio_isolate_lru(folio); > if (folio_mapped(folio)) > - try_to_unmap(folio, TTU_IGNORE_MLOCK); > + unmap_posioned_folio(folio, TTU_IGNORE_MLOCK); > + continue; > + } > + > + if (folio_test_hugetlb(folio)) { > + isolate_hugetlb(folio, &source); While you're here, should we pr_warn "failed to isolate pfn xx" for hugetlb folios too as we already done for raw pages and thp folios? Thanks. .