On 2024/8/27 9:13, Kefeng Wang wrote: > > > On 2024/8/26 22:46, David Hildenbrand wrote: >> On 17.08.24 10:49, Kefeng Wang wrote: >>> The commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned >>> pages to be offlined") don't handle the hugetlb pages, the endless >>> loop still occur if offline a hwpoison hugetlb, luckly, with the >>> commit e591ef7d96d6 ("mm,hwpoison,hugetlb,memory_hotplug: hotremove >>> memory section with hwpoisoned hugepage") section with hwpoisoned >>> hugepage"), the HPageMigratable of hugetlb page will be clear, and >>> the hwpoison hugetlb page will be skipped in scan_movable_pages(), >>> so the endless loop issue is fixed. >>> >>> However if the HPageMigratable() check passed(without reference and >>> lock), the hugetlb page may be hwpoisoned, it won't cause issue since >>> the hwpoisoned page will be handled correctly in the next movable >>> pages scan loop, and it will be isolated in do_migrate_range() but >>> fails to migrate. In order to avoid the unnecessary isolation and >>> unify all hwpoisoned page handling, let's unconditionally check hwpoison >>> firstly, and if it is a hwpoisoned hugetlb page, try to unmap it as >>> the catch all safety net like normal page does. >>> >>> Signed-off-by: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> >>> --- >>> mm/memory_hotplug.c | 17 +++++++++-------- >>> 1 file changed, 9 insertions(+), 8 deletions(-) >>> >>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >>> index dc19b0e28fbc..02a0d4fbc3fe 100644 >>> --- a/mm/memory_hotplug.c >>> +++ b/mm/memory_hotplug.c >>> @@ -1793,13 +1793,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >>> * but out loop could handle that as it revisits the split >>> * folio later. >>> */ >>> - if (folio_test_large(folio)) { >>> + if (folio_test_large(folio)) >>> pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; >>> - if (folio_test_hugetlb(folio)) { >>> - isolate_hugetlb(folio, &source); >>> - continue; >>> - } >>> - } >>> /* >>> * HWPoison pages have elevated reference counts so the migration would >>> @@ -1808,11 +1803,17 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >>> * (e.g. current hwpoison implementation doesn't unmap KSM pages but keep >>> * the unmap as the catch all safety net). >>> */ >>> - if (PageHWPoison(page)) { >>> + if (folio_test_hwpoison(folio) || >>> + (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { >> >> We have the exact same check already in mm/shmem.c now. >> >> Likely this should be factored out ... but no idea what function name we should use that won't add even more confusion :D > > Maybe folio_has_hwpoison(), and Miaohe may have some suggestion, > but leave it for later. Will it be suitable to be named as something like folio_contain_hwpoisoned_page? Thanks. .