On 2022/6/24 7:51, Naoya Horiguchi wrote: > From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > > Currently if memory_failure() (modified to remove blocking code with > subsequent patch) is called on a page in some 1GB hugepage, memory error > handling fails and the raw error page gets into leaked state. The impact > is small in production systems (just leaked single 4kB page), but this > limits the testability because unpoison doesn't work for it. > We can no longer create 1GB hugepage on the 1GB physical address range > with such leaked pages, that's not useful when testing on small systems. > > When a hwpoison page in a 1GB hugepage is handled, it's caught by the > PageHWPoison check in free_pages_prepare() because the 1GB hugepage is > broken down into raw error pages before coming to this point: > > if (unlikely(PageHWPoison(page)) && !order) { > ... > return false; > } > > Then, the page is not sent to buddy and the page refcount is left 0. > > Originally this check is supposed to work when the error page is freed from > page_handle_poison() (that is called from soft-offline), but now we are > opening another path to call it, so the callers of __page_handle_poison() > need to handle the case by considering the return value 0 as success. Then > page refcount for hwpoison is properly incremented so unpoison works. > > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> It seems I misunderstand the commit log in [1]. But I hope I get the point this time. :) Reviewed-by: Miaohe Lin <linmiaohe@xxxxxxxxxx> Thanks! [1]https://lore.kernel.org/linux-mm/19981830-a5e6-bdba-4a1c-1cdcea61b93b@xxxxxxxxxx/ > --- > mm/memory-failure.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index db85f644a1e3..fc7b83cb6468 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1046,7 +1046,6 @@ static int me_huge_page(struct page_state *ps, struct page *p) > res = truncate_error_page(hpage, page_to_pfn(p), mapping); > unlock_page(hpage); > } else { > - res = MF_FAILED; > unlock_page(hpage); > /* > * migration entry prevents later access on error hugepage, > @@ -1054,9 +1053,11 @@ static int me_huge_page(struct page_state *ps, struct page *p) > * subpages. > */ > put_page(hpage); > - if (__page_handle_poison(p) > 0) { > + if (__page_handle_poison(p) >= 0) { > page_ref_inc(p); > res = MF_RECOVERED; > + } else { > + res = MF_FAILED; > } > } > > @@ -1704,9 +1705,11 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb > */ > if (res == 0) { > unlock_page(head); > - if (__page_handle_poison(p) > 0) { > + if (__page_handle_poison(p) >= 0) { > page_ref_inc(p); > res = MF_RECOVERED; > + } else { > + res = MF_FAILED; > } > action_result(pfn, MF_MSG_FREE_HUGE, res); > return res == MF_RECOVERED ? 0 : -EBUSY; >