Re: [PATCH v1 4/4] mm/hwpoison: Fix incorrect "not recovered" report for recovered clean pages

Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx> · Thu, 13 Feb 2025 14:59:57 +0800

在 2025/2/13 11:20, Miaohe Lin 写道:
On 2025/2/12 21:55, Shuai Xue wrote:


在 2025/2/12 16:09, Miaohe Lin 写道:
On 2025/2/11 14:02, Shuai Xue wrote:
When an uncorrected memory error is consumed there is a race between
the CMCI from the memory controller reporting an uncorrected error
with a UCNA signature, and the core reporting and SRAR signature
machine check when the data is about to be consumed.

If the CMCI wins that race, the page is marked poisoned when
uc_decode_notifier() calls memory_failure(). For dirty pages,
memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag,
converting the PTE to a hwpoison entry. However, for clean pages, the
TTU_HWPOISON flag is cleared, leaving the PTE unchanged and not converted
to a hwpoison entry. Consequently, for an unmapped dirty page, the PTE is
marked as a hwpoison entry allowing kill_accessing_process() to:

- call walk_page_range() and return 1
- call kill_proc() to make sure a SIGBUS is sent
- return -EHWPOISON to indicate that SIGBUS is already sent to the process
    and kill_me_maybe() doesn't have to send it again.

Conversely, for clean pages where PTE entries are not marked as hwpoison,
kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to send a
SIGBUS.

Console log looks like this:

      Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects
      Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered
      Memory failure: 0x827ca68: already hardware poisoned
      mce: Memory error not recovered

To fix it, return -EHWPOISON if no hwpoison PTE entry is found, preventing
an unnecessary SIGBUS.

Thanks for your patch.


Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"")
Signed-off-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx>
---
   mm/memory-failure.c | 5 ++---
   1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 995a15eb67e2..f9a6b136a6f0 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -883,10 +883,9 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
                     (void *)&priv);
       if (ret == 1 && priv.tk.addr)
           kill_proc(&priv.tk, pfn, flags);
-    else
-        ret = 0;
       mmap_read_unlock(p->mm);
-    return ret > 0 ? -EHWPOISON : -EFAULT;
+
+    return ret >= 0 ? -EHWPOISON : -EFAULT;

IIUC, kill_accessing_process() is supposed to return -EHWPOISON to notify that SIGBUS is already
sent to the process and kill_me_maybe() doesn't have to send it again. But with your change,
kill_accessing_process() will return -EHWPOISON even if SIGBUS is not sent. Does this break
the semantics of -EHWPOISON?

Yes, from the comment of kill_me_maybe(),

      * -EHWPOISON from memory_failure() means that it already sent SIGBUS
      * to the current process with the proper error info,
      * -EOPNOTSUPP means hwpoison_filter() filtered the error event,

this patch break the comment.

But the defination of EHWPOISON is quite different from the comment.

  #define EHWPOISON    133    /* Memory page has hardware error */

As for this issue, returning 0 or EHWPOISON can both prevent a SIGBUS signal
from being sent in kill_me_maybe().

Which way do you prefer?


BTW I scanned the code of walk_page_range(). It seems with implementation of hwpoison_walk_ops
walk_page_range() will only return 0 or 1, i.e. always >= 0. So kill_accessing_process() will always
return -EHWPOISON if this patch is applied.

Correct me if I miss something.

Yes, you are right. Let's count the cases one by one:

1. clean page: try_to_remap(!TTU_HWPOISON), walk_page_range() will return 0 and

Do you mean try_to_unmap?

Yes, sorry for the typo.

we should not send sigbus in kill_me_maybe().

2. dirty page:
2.1 MCE wins race
           CMCI:w/o Action Require         MCE: w/ Action Require
                                       TestSetPageHWPoison
       TestSetPageHWPoison
       return -EHWPOISON
                                       try_to_unmap(TTU_HWPOISON)
                                       kill_proc in hwpoison_user_mappings()

If MCE wins the race, because the flag of memory_fialure() called by CMCI is
not set as MF_ACTION_REQUIRED, everything goes well, kill_proc() will send
SIGBUS in hwpoison_user_mappings().

2.2 CMCI win
           CMCI:w/o Action Require         MCE: w/ Action Require
     TestSetPageHWPoison
     try_to_unmap(TTU_HWPOISON)
                                        walk_page_range() return 1 due to hwpoison PTE entry
                                        kill_proc in kill_accessing_process()

If the CMCI wins the race, we need to kill the process in
kill_accessing_process(). And if try_to_remap() success, everything goes well,
kill_proc() will send SIGBUS in kill_accessing_process().

But if try_to_remap() fails, the PTE entry will not be marked as hwpoison, and
walk_page_range() return 0 as case 1 clean page, NO SIGBUS will be sent.

If try_to_unmap() fails, the PTE entry will still point to the dirty page. Then in
check_hwpoisoned_entry(), we will have pfn == poisoned_pfn. So walk_page_range()
will return 1 in this case. Or am I miss something?


You’re right; I overlooked the pte_present() branch.

Therefore, in the walk_page_range() function:
- It returns 0 when the poison page is a clean page.
- It returns 1 when CMCI wins, regardless of whether try_to_unmap succeeds
  or fails.

Then the patch will be like:

@@ -883,10 +883,9 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
 			      (void *)&priv);
 	if (ret == 1 && priv.tk.addr)
 		kill_proc(&priv.tk, pfn, flags);
-	else
-		ret = 0;
 	mmap_read_unlock(p->mm);
-	return ret > 0 ? -EHWPOISON : -EFAULT;
+
+	return ret > 0 ? -EHWPOISON : 0;

Here, returning 0 indicates that memory_failure() successfully handled the
error by dropping the clean page.



In summary, hwpoison_walk_ops cannot distinguish between try_to_unmap failing
and causing the PTE entry not to be set to hwpoison, and a clean page that
originally does not have the PTE entry set to hwpoison.

Is it possible current process is not the one accessing the hwpoisoned page? E.g. memory_failure
is deferred and called from kworker context or something like that. If it's possible, this is
another scene needs to be considered.

Yes, it possibale.

But kill_accessing_process() will only be called with MF_ACTION_REQUIRED.
MF_ACTION_REQUIRED indates that current process is exactly the one accesing the
poison data.

Fox x86 platform, GHES driver may queue a kwoker to defer memory_failure() with
flag=0.  So kill_accessing_process() will not be called in such case.

Thanks.
Shuai