On 5/7/2024 2:02 AM, Oscar Salvador wrote:
On Wed, May 01, 2024 at 05:24:56PM -0600, Jane Chu wrote:
For years when it comes down to kill a process due to hwpoison,
a SIGBUS is delivered only if unmap has been successful.
Otherwise, a SIGKILL is delivered. And the reason for that is
to prevent the involved process from accessing the hwpoisoned
page again.
Since then a lot has changed, a hwpoisoned page is marked and
upon being re-accessed, the process will be killed immediately.
So let's take out the '!unmap_success' factor and try to deliver
SIGBUS if possible.
I am missing some details here.
An unmapped hwpoison page will trigger a fault and will return
VM_FAULT_HWPOISON all the way down and then deliver SIGBUS,
but if the page was not unmapped, how will this be catch upon
re-accessing? Will the system deliver a MCE event?
I actually managed to hit the re-access case with an older version of
Linux -
MCE occurred, but unmap failed, no SIGBUS and test process re-access
the same address over and over (hence MCE after MCE), as the CPU
was unable to make forward progress. In reality, this issue is fixed with
kill_accessing_processes(). The comment for this patch refers to
comment made
about '!unmap_access' long time ago.
thanks,
-jane