Hi Andrew, This patch is worth going to mainline for next merge window. So could you queue this to your tree? Thanks, Naoya Horiguchi On Fri, Jan 22, 2021 at 01:24:24PM +0800, Aili Yao wrote: > When a memory uncorrected error is triggered by process who accessed > the address with error, It's Action Required Case for only current > process which triggered this; This Action Required case means Action > optional to other process who share the same page. Usually killing > current process will be sufficient, other processes sharing the same > page will get be signaled when they really touch the poisoned page. > > But there is another scenario that other processes sharing the same page > want to be signaled early with PF_MCE_EARLY set,In this case, we should > get them into kill list and signal BUS_MCEERR_AO to them. > > So in this patch, task_early_kill will check current process if > force_early is set, and if not current,the code will fallback to > find_early_kill_thread() to check if there is PF_MCE_EARLY process > who cares the error. > > In kill_proc(), BUS_MCEERR_AR is only send to current, other processes > in kill list will be signaled with BUS_MCEERR_AO. > > Reviewed-by: Oscar Salvador <osalvador@xxxxxxx> > Acked-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > Signed-off-by: Aili Yao <yaoaili@xxxxxxxxxxxx> > --- > mm/memory-failure.c | 34 +++++++++++++++++++--------------- > 1 file changed, 19 insertions(+), 15 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 5a38e9eade94..3fd483e6c2fb 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -243,9 +243,13 @@ static int kill_proc(struct to_kill *tk, unsigned long pfn, int flags) > pfn, t->comm, t->pid); > > if (flags & MF_ACTION_REQUIRED) { > - WARN_ON_ONCE(t != current); > - ret = force_sig_mceerr(BUS_MCEERR_AR, > + if (t == current) > + ret = force_sig_mceerr(BUS_MCEERR_AR, > (void __user *)tk->addr, addr_lsb); > + else > + /* Signal other processes sharing the page if they have PF_MCE_EARLY set. */ > + ret = send_sig_mceerr(BUS_MCEERR_AO, (void __user *)tk->addr, > + addr_lsb, t); > } else { > /* > * Don't use force here, it's convenient if the signal > @@ -440,26 +444,26 @@ static struct task_struct *find_early_kill_thread(struct task_struct *tsk) > * Determine whether a given process is "early kill" process which expects > * to be signaled when some page under the process is hwpoisoned. > * Return task_struct of the dedicated thread (main thread unless explicitly > - * specified) if the process is "early kill," and otherwise returns NULL. > + * specified) if the process is "early kill" and otherwise returns NULL. > * > - * Note that the above is true for Action Optional case, but not for Action > - * Required case where SIGBUS should sent only to the current thread. > + * Note that the above is true for Action Optional case. For Action Required > + * case, it's only meaningful to the current thread which need to be signaled > + * with SIGBUS, this error is Action Optional for other non current > + * processes sharing the same error page,if the process is "early kill", the > + * task_struct of the dedicated thread will also be returned. > */ > static struct task_struct *task_early_kill(struct task_struct *tsk, > int force_early) > { > if (!tsk->mm) > return NULL; > - if (force_early) { > - /* > - * Comparing ->mm here because current task might represent > - * a subthread, while tsk always points to the main thread. > - */ > - if (tsk->mm == current->mm) > - return current; > - else > - return NULL; > - } > + /* > + * Comparing ->mm here because current task might represent > + * a subthread, while tsk always points to the main thread. > + */ > + if (force_early && tsk->mm == current->mm) > + return current; > + > return find_early_kill_thread(tsk); > } > > -- > 2.25.1 >