Re: [PATCH v2 07/11] mm/memory-failure: Convert memory_failure() to use a folio

Miaohe Lin <linmiaohe@xxxxxxxxxx> · Thu, 11 Apr 2024 17:00:33 +0800

On 2024/4/11 9:51, Matthew Wilcox wrote:
> On Wed, Apr 10, 2024 at 06:27:38PM -0700, Jane Chu wrote:
>> On 4/10/2024 4:15 PM, Jane Chu wrote:
>>
>>> On 4/10/2024 3:21 AM, Oscar Salvador wrote:
>>>
>>>> On Mon, Apr 08, 2024 at 05:34:29PM -0700, Jane Chu wrote:
>>>>> At this stage,  'p' is not expected to be a large page since a
>>>>> _refcount has
>>>>> been taken, right?  So is it reasonable to replace the "if
>>>>> (retry)" block
>>>>> with a VM_BUG_ON warning?  otherwise, if 'p' became part of a different
>>>>> large folio,  how to recover from here ?
>>>> We might have split the THP (if it was part of), but AFAICS
>>>> nothing stops the kernel to coallesce this page again into a new THP
>>>> via e.g:
>>>> madvise(MADV_HUGEPAGE) ?
>>>
>>> Good point, thanks!
>>
>> Come to think of it a bit more, maybe the check could be
>>
>>     if (folio_test_large(folio) || (folio != page_folio(p))) {
>>
>> ? Because, 'p' could have become part of a THP after shake_folio() and
>> before folio_lock(), right?

This check is originally introduced via commit:

"""
commit f37d4298aa7f8b74395aa13c728677e2ed86fdaf
Author: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Date:   Wed Aug 6 16:06:49 2014 -0700

    hwpoison: fix race with changing page during offlining

    When a hwpoison page is locked it could change state due to parallel
    modifications.  The original compound page can be torn down and then
    this 4k page becomes part of a differently-size compound page is is a
    standalone regular page.

    Check after the lock if the page is still the same compound page.

    We could go back, grab the new head page and try again but it should be
    quite rare, so I thought this was safest.  A retry loop would be more
    difficult to test and may have more side effects.

    The hwpoison code by design only tries to handle cases that are
    reasonably common in workloads, as visible in page-flags.

    I'm not really that concerned about handling this (likely rare case),
    just not crashing on it.

    Signed-off-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
    Acked-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
    Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
    Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a013bc94ebbe..44c6bd201d3a 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1172,6 +1172,16 @@ int memory_failure(unsigned long pfn, int trapno, int flags)

 	lock_page(hpage);

+	/*
+	 * The page could have changed compound pages during the locking.
+	 * If this happens just bail out.
+	 */
+	if (compound_head(p) != hpage) {
+		action_result(pfn, "different compound page after locking", IGNORED);
+		res = -EBUSY;
+		goto out;
+	}
+
 	/*
 	 * We use page flags to determine what action should be taken, but
 	 * the flags can be modified by the error containment action.  One

"""

And the discussion can be found at [1]. The root cause for the issue is indeterminate, but most probably:

[1] https://lore.kernel.org/linux-mm/20140626195036.GA5311@xxxxxxxxxxxxxxxx/

"""
And I think that the problem you report is caused by another part of hwpoison
code, because we have PageSlab check at the beginning of hwpoison_user_mappings(),
so if LRU page truned into slab page just before locking the page, we never
reach try_to_unmap().
I think this was caused by the code around lock migration after thp split
in hwpoison_user_mappings(), which was introduced in commit 54b9dd14d09f
("mm/memory-failure.c: shift page lock from head page to tail page after thp split").
I guess the tail page (raw error page) was freed and turned into Slab page
just after thp split and before locking the error page.
So possible solution is to do page status check again after thp split.
"""

IIUC, it's because the page lock is shifted from @hpage to @p. So there's a window
that p is freed and turned into Slab page.

@@ -942,11 +942,14 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
                         * We pinned the head page for hwpoison handling,
                         * now we split the thp and we are interested in
                         * the hwpoisoned raw page, so move the refcount
-                        * to it.
+                        * to it. Similarly, page lock is shifted.
                         */
                        if (hpage != p) {
                                put_page(hpage);
                                get_page(p);
+                               lock_page(p);
+                               unlock_page(hpage);
+                               *hpagep = p;
                        }
                        /* THP is split, so ppage should be the real poisoned page. */
                        ppage = p;

> 
> What is the mechanism for reassembling a large folio?  I don't see that
> code anywhere.

But as code changes, the above page lock shift is gone. And I think below logic can't
trigger now. As we hold extra page refcnt so page can't be coallesced into a new THP or Slab page.

	/*
	 * We're only intended to deal with the non-Compound page here.
	 * However, the page could have changed compound pages due to
	 * race window. If this happens, we could try again to hopefully
	 * handle the page next round.
	 */
	if (PageCompound(p)) {
		if (retry) {
			ClearPageHWPoison(p);
			unlock_page(p);
			put_page(p);
			flags &= ~MF_COUNT_INCREASED;
			retry = false;
			goto try_again;
		}
		res = action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
		goto unlock_page;
	}

So it might be better to replace above code block as WARN_ON(PageCompound(p)) and remove MF_MSG_DIFFERENT_COMPOUND case.
Any thoughts?

Thanks.
.

> 
> .
>