On Fri, May 21, 2021 at 12:01:56PM +0900, Naoya Horiguchi wrote: > From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > > Now an action required MCE in already hwpoisoned address surely sends a > SIGBUS to current process, but the SIGBUS doesn't convey error virtual > address. That's not optimal for hwpoison-aware applications. > > To fix the issue, make memory_failure() call kill_accessing_process(), > that does pagetable walk to find the error virtual address. It could > find multiple virtual addresses for the same error page, and it seems > hard to tell which virtual address is correct one. But that's rare > and sending incorrect virtual address could be better than no address. > So let's report the first found virtual address for now. > > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > --- > change log v4 -> v5: > - switched to first found approach, > - introduced check_hwpoisoned_pmd_entry() to fix build failure on arch > without thp support. > > change log v3 -> v4: > - refactored hwpoison_pte_range to save indentation, > - updated patch description > > change log v1 -> v2: > - initialize local variables in check_hwpoisoned_entry() and > hwpoison_pte_range() > - fix and improve logic to calculate error address offset. > --- ... > +static int kill_accessing_process(struct task_struct *p, unsigned long pfn, > + int flags) > +{ > + int ret; > + struct hwp_walk priv = { > + .pfn = pfn, > + }; > + priv.tk.tsk = p; > + > + mmap_read_lock(p->mm); > + ret = walk_page_range(p->mm, 0, TASK_SIZE, &hwp_walk_ops, > + (void *)&priv); > + if (!ret && priv.tk.addr) Sorry, I found a silly mistake, the walk_page_range() got to return 1 when it found at least error virtual address since v5, so this if-condition should be like this. @@ -691,7 +691,8 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn, mmap_read_lock(p->mm); ret = walk_page_range(p->mm, 0, TASK_SIZE, &hwp_walk_ops, (void *)&priv); - if (!ret && priv.tk.addr) + if (ret == 1 && priv.tk.addr) kill_proc(&priv.tk, pfn, flags); mmap_read_unlock(p->mm); return ret ? -EFAULT : -EHWPOISON; Andrew, this patch is now in linux-mm, so could you apply this fix onto mmhwpoison-send-sigbus-with-error-virutal-address.patch ? Or if it's better to resend a whole patch, please let me know. Thanks, Naoya Horiguchi