Re: [PATCH] mm,hwpoison: non-current task should be checked early_kill for force_early

Oscar Salvador <osalvador@xxxxxxx> · Mon, 18 Jan 2021 10:24:23 +0100

On Mon, Jan 18, 2021 at 08:57:47AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> I'm not sure what you mean by "non current process error case" and "we
> should mark it AO", so could you explain more specifically about your error
> scenario?  Especially I'd like to know about who triggers hard offline on
> what hardware events and what "wrong action" could happen.  Maybe just
> "calling memory_failure() with MF_ACTION_REQUIRED" is not enough, because
> it's not enough for us to see that your scenario is possible. Current
> implementation implicitly assumes some hardware behavior, and does not work
> for the case which never happens under the assumption.

So, the scenario case is a multithread application with the same page mapped.
And PF_MCE_KILL_EARLY flag was set.

IIUIC, Aili Yao concern is that when the MCE machinery calls memory_failure
which MF_ACTION_REQUIRED, only the process that triggered the MCE exception
will receive a SIGBUG, and not the other threads that might have PF_MCE_EARLY.
Aili Yao would like memory_failure() to also signal those threads who might
have the flag set, in case they want to do something with that information.

But reading the code, I do not think that is what the code expects.
Looking at the comment above find_early_kill_thread:

"/*
 * Find a dedicated thread which is supposed to handle SIGBUS(BUS_MCEERR_AO)
 * on behalf of the thread group. Return task_struct of the (first found)
 * dedicated thread if found, and return NULL otherwise.
 *
 * We already hold read_lock(&tasklist_lock) in the caller, so we don't
 * have to call rcu_read_lock/unlock() in this function.
 */"

What I understand from that is:

"
 If memory_failure() was not triggered by any concrete process (aka: no one was
 trying to manipulate the corrupted area), we need to find the main thread who
 might have set the MCE policy by pcrtl and see if they want to be signaled
 __before__ they access the corrupted area.

"

Note that if the PF_MCE policy was not set, we check the global knob
sysctm_memory_early_kill.
And if that is not set either, we defer the signaling till later when a process
actually tries to operate the corrupted area.

Does that makes sense?

Actually, unless I am mistaken, if a multithread process receives a signal,
all threads belonging to the process will receive the signal as well:

"The signal disposition is a per-process attribute: in a
multithreaded application, the disposition of a particular signal
is the same for all threads."

-- 
Oscar Salvador
SUSE L3