On 2022/4/25 21:55, HORIGUCHI NAOYA(堀口 直也) wrote: > On Fri, Apr 22, 2022 at 02:53:48PM +0800, Miaohe Lin wrote: >> On 2022/4/22 6:04, Mike Kravetz wrote: >>> On 4/21/22 07:20, David Hildenbrand wrote: >>>> On 21.04.22 15:51, Miaohe Lin wrote: >>>>> When trying to offline pages, HWPoisoned hugepage is migrated without >>>>> checking PageHWPoison first. So corrupted data could be consumed. Fix >>>>> it by deferring isolate_huge_page until PageHWPoison is handled. >>>>> >>>> >>>> CCing Oscar, Mike and Naoya >>>> >>>>> Signed-off-by: Miaohe Lin <linmiaohe@xxxxxxxxxx> >>>>> --- >>>>> mm/memory_hotplug.c | 11 +++++++---- >>>>> 1 file changed, 7 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >>>>> index 4c6065e5d274..093f85ec5c5c 100644 >>>>> --- a/mm/memory_hotplug.c >>>>> +++ b/mm/memory_hotplug.c >>>>> @@ -1600,11 +1600,9 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >>>>> folio = page_folio(page); >>>>> head = &folio->page; >>>>> >>>>> - if (PageHuge(page)) { >>>>> + if (PageHuge(page)) >>>>> pfn = page_to_pfn(head) + compound_nr(head) - 1; >>>>> - isolate_huge_page(head, &source); >>>>> - continue; >>>>> - } else if (PageTransHuge(page)) >>>>> + else if (PageTransHuge(page)) >>>>> pfn = page_to_pfn(head) + thp_nr_pages(page) - 1; >>>>> >>>>> /* >>>>> @@ -1622,6 +1620,11 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >>>>> continue; >>>>> } >>>>> >>>>> + if (PageHuge(page)) { >>>>> + isolate_huge_page(head, &source); >>>>> + continue; >>>>> + } >>>>> + >>>>> if (!get_page_unless_zero(page)) >>>>> continue; >>>>> /* >>>> >>>> The problem statement makes sense to me but I am not sure about the >>>> details if we run into the "PageHWPoison" path with a huge page. I have >>>> the gut feeling that we have to do more for huge pages in the >>>> PageHWPoison() path, because we might be dealing with a free huge page >>>> after unmap succeeds. I might be wrong. >>>> >>> >>> Thinking about memory errors always makes my head hurt :) >> >> Me too. ;) >> >>> >>> What 'state' could a poisoned hugetlb page be in here? >>> - Mapped into a process address space? >>> - On the hugetlb free lists? > > There seems at least 2 cases where a hwpoisoned hugetlb page keeps undissolved. > (1) Hwpoisoned hugepage in free hugepage list > (2) Hwpoisoned hugepage belonging to a hugetlbfs file > >>> >>> IIUC, when poisoning a hugetlb page we try to dissolve those that are >>> free and unmap those which are mapped. So, this means those operations >>> must have failed for some reason. Is that correct? > > Yes, (1) is made by failure in error handling, but (2) can be made even > when memory error on in-use hugepage is successfully handled. Could you please explain 2 more detailed? IIUC, in-use hugepage belonging to a hugetlbfs file will be removed from pagecache via hugetlbfs_error_remove_page. So when successfully handled, hugepage->mapping will be NULL and thus not belonging to a hugetlbfs file. Or am I miss something? > >> >> I think you're right. >> >> BTW: Now that PageHWPoison is set under hugetlb_lock and HPageFreed is also checked >> under that lock, it should be more likely to handle the hugetlb page on the free lists >> successfully because the possible race with hugetlb page allocation or demotion can be >> prevented by PageHWPoison flag. So it is more likely to be a mapped hugetlb page here. >> But that doesn't matter. >> >>> >>> Now, IF the above is true this implies there is a poisoned page somewhere >>> within the hugetlb page. But, poison is only marked in the head page. >>> So, we do not really know where within the page the actual error exists. >>> Is my reasoning still correct? >>> >> >> IIRC, Naoya once proposed to store the poisoned page info into subpage field. > > Yes, it's important to keep the offset of raw page. I have a "storing raw > error info" patch for this, but it's still immature to submit. > >> >>> If my reasoning is correct, then I am not sure what we can do here. >>> The code to handle poison is: >>> >>> if (PageHWPoison(page)) { >>> if (WARN_ON(folio_test_lru(folio))) >>> folio_isolate_lru(folio); >>> if (folio_mapped(folio)) >>> try_to_unmap(folio, TTU_IGNORE_MLOCK); >>> continue; >>> } >>> >>> My first observation is that if a hugetlb page is passed to try_to_unmap >>> as above we may BUG. This is because we need to hold i_mmap_rwsem in >>> write mode because of the possibility of calling huge_pmd_unshare. :( >> >> I'm sorry. I missed that case. :( > > OK, so the above check should not be called for hugetlb pages. > >> >>> >>> I 'think' try_to_unmap could succeed on a poisoned hugetlb page once we >>> add correct locking. So, the page would be unmapped. I do not think anyone >>> is holding a ref, so the last unmap should put the hugetlb page on the >>> free list. Correct? We will not migrate the page, but ... > > That seems depends on whether the hugepage is anonymous or file. > If we consider anonymous hugepage, the above statement is correct. > > As for what we can do, for (1) we can simply skip the hwpoisoned hugepage > in do_migrate_range() (i.e. no need to call isolate_huge_page()), because in > the outer while-loop in offline_pages() we call dissolve_free_huge_pages() > so free hugepage can be handled here. The above "storing raw error info" > patch will also enable dissolve_free_huge_pages() to handle hwpoisoned free > hugepage (by moving PG_hwpoison flag to the raw error page), so I assume > to have the patch. Yes, I think (1) will be fixed this way. > > As for (2), I think that making such hugepages unmovable (by clearing > HPageMigratable) could work. With that change, scan_movable_pages() detects > unmovable pages so do_migrate_range is not called for the hugepages. > Moreover, scan_movable_pages() should return -EBUSY if an unmovable page is found, > so offline_pages() fails without trying to touch the hwpoisoned hugepage. Yes, it seems work but this will lead to the memory offline fails. Could this be too overkill? > Anyway the reported problem (try to migrate hwpoisoned hugepage) would to be solved. > > # Maybe we might optionally try to free the hwpoisoned hugepage by removing > # the affected hugetlbfs file, but it's not unclear to me taht that's acceptable > # (a file is gone when doing memory hotremove?). > >> >> Might hugetlb page still be in the pagecache? But yes, the last unmap could put >> the hugetlb page on the free list. >> >>> >>> After the call to do_migrate_range() in offline_pages, we will call >>> dissolve_free_huge_pages. For each hugetlb page, dissolve_free_huge_pages >>> will call dissolve_free_huge_page likely passing in the 'head' page. >>> When dissolve_free_huge_page is called for poisoned hugetlb pages from >>> the memory error handling code, it passes in the sub page which contains >>> the memory error. Before freeing the hugetlb page to buddy, there is this >>> code: >>> >>> /* >>> * Move PageHWPoison flag from head page to the raw >>> * error page, which makes any subpages rather than >>> * the error page reusable. >>> */ >>> if (PageHWPoison(head) && page != head) { >>> SetPageHWPoison(page); >>> ClearPageHWPoison(head); >>> } >>> update_and_free_page(h, head, false) >>> >>> As previously mentioned, outside the memory error handling code we do >>> not know what page within the hugetlb page contains poison. So, this >>> will likely result with a page with errors on the free list and an OK >>> page marked with error. >> >> Yes, this is possible. It's really a pity. Could we just reject to dissolve the >> hugetlb page and keep the hugetlb page around? This will waste some healthy subpages >> but can do the right things? > > Could you let me try writing patches in my approach? > I'll try to submit the fix with the "storing raw error info" patch. Sure, I will wait for that. Many thanks for your hard work! :) > >> >>> >>> In other places, we 'bail' if we encounter a hugetlb page with poison. >>> It would be unfortunate to prevent memory offline if the range contains >>> a hugetlb page with poison. After all, offlining a section with poison >>> make sense. >>> >> >> IIUC, a (hugetlb) page with poison could be offlined anytime even if it's still in the >> pagecache, swapcache, i.e. still be referenced by somewhere, because memory offline will >> just offline the poisoned page while ignoring the page refcnt totally. I can't figure >> out a easy way to fix it yet. :( > > Yes, that's finally expected behavior, but unlike normal pagecache we need > pay extra attention for hugetlbfs file because it's not storage-backed. > > Thank you for valuable comments, everyone. BTW: There is another possible problem about hwpoison. Think about the below scene: offline_pages do { scan_movable_pages __PageMovable /* Page is also hwpoisoned. */ goto found; /* So ret is always 0. */ do_migrate_range if (PageHWPoison(page)) /* PageMovable hwpoisoned page is not handled. */ continue; } while (!ret); /* We will meet that page again and again. */ So we might busy-loop until a signal is pending. Or am I miss something? And I think this could be fixed by deferring set hwpoisoned flag until page refcnt is successfully incremented for normal page in memory_failure() which is already in your to-do list. So this possible issue could be left alone now? Thanks! > > - Naoya Horiguchi >