Re: [PATCH] mm/memory_hotplug: avoid consuming corrupted data when offline pages

HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@xxxxxxx> · Wed, 27 Apr 2022 03:13:19 +0000

On Tue, Apr 26, 2022 at 07:41:52PM +0800, Miaohe Lin wrote:
> On 2022/4/25 21:55, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Fri, Apr 22, 2022 at 02:53:48PM +0800, Miaohe Lin wrote:
> >> On 2022/4/22 6:04, Mike Kravetz wrote:
> >>> On 4/21/22 07:20, David Hildenbrand wrote:
> >>>> On 21.04.22 15:51, Miaohe Lin wrote:
> >>>>> When trying to offline pages, HWPoisoned hugepage is migrated without
> >>>>> checking PageHWPoison first. So corrupted data could be consumed. Fix
> >>>>> it by deferring isolate_huge_page until PageHWPoison is handled.
> >>>>>
> >>>>
> >>>> CCing Oscar, Mike and Naoya
> >>>>
> >>>>> Signed-off-by: Miaohe Lin <linmiaohe@xxxxxxxxxx>
> >>>>> ---
> >>>>>  mm/memory_hotplug.c | 11 +++++++----
> >>>>>  1 file changed, 7 insertions(+), 4 deletions(-)
> >>>>>
> >>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> >>>>> index 4c6065e5d274..093f85ec5c5c 100644
> >>>>> --- a/mm/memory_hotplug.c
> >>>>> +++ b/mm/memory_hotplug.c
> >>>>> @@ -1600,11 +1600,9 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> >>>>>  		folio = page_folio(page);
> >>>>>  		head = &folio->page;
> >>>>>  
> >>>>> -		if (PageHuge(page)) {
> >>>>> +		if (PageHuge(page))
> >>>>>  			pfn = page_to_pfn(head) + compound_nr(head) - 1;
> >>>>> -			isolate_huge_page(head, &source);
> >>>>> -			continue;
> >>>>> -		} else if (PageTransHuge(page))
> >>>>> +		else if (PageTransHuge(page))
> >>>>>  			pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
> >>>>>  
> >>>>>  		/*
> >>>>> @@ -1622,6 +1620,11 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> >>>>>  			continue;
> >>>>>  		}
> >>>>>  
> >>>>> +		if (PageHuge(page)) {
> >>>>> +			isolate_huge_page(head, &source);
> >>>>> +			continue;
> >>>>> +		}
> >>>>> +
> >>>>>  		if (!get_page_unless_zero(page))
> >>>>>  			continue;
> >>>>>  		/*
> >>>>
> >>>> The problem statement makes sense to me but I am not sure about the
> >>>> details if we run into the "PageHWPoison" path with a huge page. I have
> >>>> the gut feeling that we have to do more for huge pages in the
> >>>> PageHWPoison() path, because we might be dealing with a free huge page
> >>>> after unmap succeeds. I might be wrong.
> >>>>
> >>>
> >>> Thinking about memory errors always makes my head hurt :)
> >>
> >> Me too. ;)
> >>
> >>>
> >>> What 'state' could a poisoned hugetlb page be in here?
> >>> - Mapped into a process address space?
> >>> - On the hugetlb free lists?
> > 
> > There seems at least 2 cases where a hwpoisoned hugetlb page keeps undissolved.
> >   (1) Hwpoisoned hugepage in free hugepage list
> >   (2) Hwpoisoned hugepage belonging to a hugetlbfs file
> > 
> >>>
> >>> IIUC, when poisoning a hugetlb page we try to dissolve those that are
> >>> free and unmap those which are mapped.  So, this means those operations
> >>> must have failed for some reason.  Is that correct?
> > 
> > Yes, (1) is made by failure in error handling, but (2) can be made even
> > when memory error on in-use hugepage is successfully handled.
> 
> Could you please explain 2 more detailed? IIUC, in-use hugepage belonging to a hugetlbfs
> file will be removed from pagecache via hugetlbfs_error_remove_page. So when successfully
> handled, hugepage->mapping will be NULL and thus not belonging to a hugetlbfs file. Or am
> I miss something?

No, you're right.  Sorry for my misunderstanding about who takes the last
refcount in case (2).

> 
> > 
> >>
> >> I think you're right.
> >>
> >> BTW: Now that PageHWPoison is set under hugetlb_lock and HPageFreed is also checked
> >> under that lock, it should be more likely to handle the hugetlb page on the free lists
> >> successfully because the possible race with hugetlb page allocation or demotion can be
> >> prevented by PageHWPoison flag. So it is more likely to be a mapped hugetlb page here.
> >> But that doesn't matter.
> >>
> >>>
> >>> Now, IF the above is true this implies there is a poisoned page somewhere
> >>> within the hugetlb page.  But, poison is only marked in the head page.
> >>> So, we do not really know where within the page the actual error exists.
> >>> Is my reasoning still correct?
> >>>
> >>
> >> IIRC, Naoya once proposed to store the poisoned page info into subpage field.
> > 
> > Yes, it's important to keep the offset of raw page.  I have a "storing raw
> > error info" patch for this, but it's still immature to submit.
> > 
> >>
> >>> If my reasoning is correct, then I am not sure what we can do here.
> >>> The code to handle poison is:
> >>>
> >>>                  if (PageHWPoison(page)) {
> >>>                         if (WARN_ON(folio_test_lru(folio)))
> >>>                                 folio_isolate_lru(folio);
> >>>                         if (folio_mapped(folio))
> >>>                                 try_to_unmap(folio, TTU_IGNORE_MLOCK);
> >>>                         continue;
> >>>                 }
> >>>
> >>> My first observation is that if a hugetlb page is passed to try_to_unmap
> >>> as above we may BUG.  This is because we need to hold i_mmap_rwsem in
> >>> write mode because of the possibility of calling huge_pmd_unshare.  :(
> >>
> >> I'm sorry. I missed that case. :(
> > 
> > OK, so the above check should not be called for hugetlb pages.
> > 
> >>
> >>>
> >>> I 'think' try_to_unmap could succeed on a poisoned hugetlb page once we
> >>> add correct locking.  So, the page would be unmapped.  I do not think anyone
> >>> is holding a ref, so the last unmap should put the hugetlb page on the
> >>> free list.  Correct?  We will not migrate the page, but ...
> > 
> > That seems depends on whether the hugepage is anonymous or file.
> > If we consider anonymous hugepage, the above statement is correct.
> > 
> > As for what we can do, for (1) we can simply skip the hwpoisoned hugepage
> > in do_migrate_range() (i.e. no need to call isolate_huge_page()), because in
> > the outer while-loop in offline_pages() we call dissolve_free_huge_pages()
> > so free hugepage can be handled here.  The above "storing raw error info"
> > patch will also enable dissolve_free_huge_pages() to handle hwpoisoned free
> > hugepage (by moving PG_hwpoison flag to the raw error page), so I assume
> > to have the patch.
> 
> Yes, I think (1) will be fixed this way.
> 
> > 
> > As for (2), I think that making such hugepages unmovable (by clearing
> > HPageMigratable) could work.  With that change, scan_movable_pages() detects
> > unmovable pages so do_migrate_range is not called for the hugepages.
> > Moreover, scan_movable_pages() should return -EBUSY if an unmovable page is found,
> > so offline_pages() fails without trying to touch the hwpoisoned hugepage.
> 
> Yes, it seems work but this will lead to the memory offline fails. Could this be too overkill?

I think so. And I found (in writing/testing patches) that releasing the last
refcount in dissolve_free_huge_pages() in offline_pages() could work, so
offline_pages() does not have to fail.  So that's better than what I wrote above.

> 
> > Anyway the reported problem (try to migrate hwpoisoned hugepage) would to be solved.
> > 
> > # Maybe we might optionally try to free the hwpoisoned hugepage by removing
> > # the affected hugetlbfs file, but it's not unclear to me taht that's acceptable
> > # (a file is gone when doing memory hotremove?).
> > 
> >>
> >> Might hugetlb page still be in the pagecache? But yes, the last unmap could put
> >> the hugetlb page on the free list.
> >>
> >>>
> >>> After the call to do_migrate_range() in offline_pages, we will call
> >>> dissolve_free_huge_pages.  For each hugetlb page, dissolve_free_huge_pages
> >>> will call dissolve_free_huge_page likely passing in the 'head' page.
> >>> When dissolve_free_huge_page is called for poisoned hugetlb pages from
> >>> the memory error handling code, it passes in the sub page which contains
> >>> the memory error.  Before freeing the hugetlb page to buddy, there is this
> >>> code:
> >>>
> >>>                         /*
> >>>                          * Move PageHWPoison flag from head page to the raw
> >>>                          * error page, which makes any subpages rather than
> >>>                          * the error page reusable.
> >>>                          */
> >>>                         if (PageHWPoison(head) && page != head) {
> >>>                                 SetPageHWPoison(page);
> >>>                                 ClearPageHWPoison(head);
> >>>                         }
> >>>                         update_and_free_page(h, head, false)
> >>>
> >>> As previously mentioned, outside the memory error handling code we do
> >>> not know what page within the hugetlb page contains poison.  So, this
> >>> will likely result with a page with errors on the free list and an OK
> >>> page marked with error.
> >>
> >> Yes, this is possible. It's really a pity. Could we just reject to dissolve the
> >> hugetlb page and keep the hugetlb page around? This will waste some healthy subpages
> >> but can do the right things?
> > 
> > Could you let me try writing patches in my approach?
> > I'll try to submit the fix with the "storing raw error info" patch.
> 
> Sure, I will wait for that. Many thanks for your hard work! :)

I have almost prepared this, so will send this afternoon.

> > 
> >>
> >>>
> >>> In other places, we 'bail' if we encounter a hugetlb page with poison.
> >>> It would be unfortunate to prevent memory offline if the range contains
> >>> a hugetlb page with poison.  After all, offlining a section with poison
> >>> make sense.
> >>>
> >>
> >> IIUC, a (hugetlb) page with poison could be offlined anytime even if it's still in the
> >> pagecache, swapcache, i.e. still be referenced by somewhere, because memory offline will
> >> just offline the poisoned page while ignoring the page refcnt totally. I can't figure
> >> out a easy way to fix it yet. :(
> > 
> > Yes, that's finally expected behavior, but unlike normal pagecache we need
> > pay extra attention for hugetlbfs file because it's not storage-backed.
> > 
> > Thank you for valuable comments, everyone.
> 
> BTW: There is another possible problem about hwpoison. Think about the below scene:
> 
> offline_pages
> do {
>   scan_movable_pages
>     __PageMovable /* Page is also hwpoisoned. */
>       goto found; /* So ret is always 0. */
>   do_migrate_range
>      if (PageHWPoison(page)) /* PageMovable hwpoisoned page is not handled. */
>         continue;
>  } while (!ret); /* We will meet that page again and again. */
> 
> So we might busy-loop until a signal is pending. Or am I miss something? And I think this could be fixed by
> deferring set hwpoisoned flag until page refcnt is successfully incremented for normal page in memory_failure()
> which is already in your to-do list. So this possible issue could be left alone now?

Thank you for finding this.  Maybe I saw similar issue (infinite loop) in
testing patches, but not sure about the detail.  I'll learn it more.

Thanks,
Naoya Horiguchi