On Wed, Jan 20, 2021 at 02:15:09PM +0800, Aili Yao wrote: > When a memory uncorrected error is triggered by process A who accessed > the address with error; It's Action Required Case for only current > process which triggered this.this Action Required case means Action > optional to other process who share the same page. Usually, kill current > process will be sufficient, other process sharing the same page will > get be signaled when they really touch the poisoned page. > > But there is another scenario that other processes > sharing the same page want to be signaled early with PF_MCE_EARLY set, > In this case, we should get them into kill list and signal > BUS_MCEERR_AO to them. > > So in this patch, task_early_kill will check current process if > force_early is set, and if not current,check find_early_kill_thread > to see if there is PF_MCE_EARLY process which cares the error. > > In kill_proc, BUS_MCEERR_AR is only send to current, other process in > kill list will be signaled BUS_MCEERR_AO. > > Reviewed-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > Reviewed-by: Oscar Salvador <osalvador@xxxxxxx> "Reviewed-by" tag has some special meaning in Linux kernel development, and that should be tagged explicitly by reviewers. So please wait a little more to add this :) See the document for more details: https://www.kernel.org/doc/html/v5.10/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes > Signed-off-by: Aili Yao <yaoaili@xxxxxxxxxxxx> > --- > mm/memory-failure.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 5a38e9eade94..2d6047780466 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -243,9 +243,12 @@ static int kill_proc(struct to_kill *tk, unsigned long pfn, int flags) > pfn, t->comm, t->pid); > > if (flags & MF_ACTION_REQUIRED) { > - WARN_ON_ONCE(t != current); > - ret = force_sig_mceerr(BUS_MCEERR_AR, > + if (tk->tsk == current) > + ret = force_sig_mceerr(BUS_MCEERR_AR, > (void __user *)tk->addr, addr_lsb); > + else > + ret = send_sig_mceerr(BUS_MCEERR_AO, (void __user *)tk->addr, > + addr_lsb, t); > } else { > /* > * Don't use force here, it's convenient if the signal > @@ -454,11 +457,12 @@ static struct task_struct *task_early_kill(struct task_struct *tsk, > /* > * Comparing ->mm here because current task might represent > * a subthread, while tsk always points to the main thread. > + * If tsk is not current, we need to fallback to > + * find_early_kill_thread checking whether other processes with > + * PF_MCE_EARLY set still care the error. > */ Sorry for my unclearness, I meant to ask for updating the comment below (especially "Note that ..." part), which gets obsolete by your patch. > /* > * Determine whether a given process is "early kill" process which expects > * to be signaled when some page under the process is hwpoisoned. > * Return task_struct of the dedicated thread (main thread unless explicitly > * specified) if the process is "early kill," and otherwise returns NULL. > * > * Note that the above is true for Action Optional case, but not for Action > * Required case where SIGBUS should sent only to the current thread. > */ > static struct task_struct *task_early_kill(struct task_struct *tsk, > int force_early) > { Thanks, Naoya Horiguchi > if (tsk->mm == current->mm) > return current; > - else > - return NULL; > } > return find_early_kill_thread(tsk); > } > -- > 2.25.1 >