Hi Aili, On Mon, Feb 01, 2021 at 04:17:49PM +0800, Aili Yao wrote: > When one page is already hwpoisoned by AO action, process may not be > killed, the process mapping this page may make a syscall include this > page and result to trigger a VM_FAULT_HWPOISON fault, if it's in kernel > mode it may be fixed by fixup_exception. Current code will just return > error code to user process. > > This is not sufficient, we should send a SIGBUS to the process and log > the info to console, as we can't trust the process will handle the error > correctly. > > Suggested-by: Feng Yang <yangfeng1@xxxxxxxxxxxx> > Signed-off-by: Aili Yao <yaoaili@xxxxxxxxxxxx> > --- ... > @@ -662,12 +662,32 @@ no_context(struct pt_regs *regs, unsigned long error_code, > * In this case we need to make sure we're not recursively > * faulting through the emulate_vsyscall() logic. > */ > + > + if (IS_ENABLED(CONFIG_MEMORY_FAILURE) && > + fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) { > + unsigned int lsb = 0; > + > + pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n", > + current->comm, current->pid, address); > + > + sanitize_error_code(address, &error_code); > + set_signal_archinfo(address, error_code); > + > + if (fault & VM_FAULT_HWPOISON_LARGE) > + lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault)); > + if (fault & VM_FAULT_HWPOISON) > + lsb = PAGE_SHIFT; > + > + force_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb); This part contains some duplicated code with do_sigbus(), so some refactoring (like adding a common function) would be more helpful. Thanks, Naoya Horiguchi