Uncorrected memory errors are signaled to processes using SIGBUS or an error retval from a syscall. But there's a corner cases in execve where a SIGSEGV will be delivered. Specifically this will happen if the binary loader triggers a memory error reading user pages. The architecture's handler (MCE handler on x86) may queue a call to memory_failure but that won't run until the execve() returns. The binary loader is called after bprm->point_of_no_return is set meaning that any error is handled by bprm_execve() with a SIGSEGV to the process. To ensure it is terminated with a SIGBUS we 1. let pending work run in the bprm_execve error case. And 2. ensure memory_failure() is passed MF_ACTION_REQUIRED so that the SIGBUS is queued. Normally when the MCE is in a syscall, a fixup of return IP and a call to kill_me_never are enough. But in this case it's necessary to queue kill_me_maybe() which will set MF_ACTION_REQUIRED. Reuse current->in_execve to make the decision. Signed-off-by: Andrew Zaborowski <andrew.zaborowski@xxxxxxxxx> --- arch/x86/kernel/cpu/mce/core.c | 14 ++++++++++++++ fs/exec.c | 12 +++++++++--- include/linux/sched.h | 2 +- 3 files changed, 24 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 84d41be6d06b..11effdff942c 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1593,6 +1593,20 @@ noinstr void do_machine_check(struct pt_regs *regs) else queue_task_work(&m, msg, kill_me_maybe); + } else if (current->in_execve) { + /* + * Cannot recover a task in execve() beyond point of no + * return but stop further user memory accesses. + */ + if (m.kflags & MCE_IN_KERNEL_RECOV) { + if (!fixup_exception(regs, X86_TRAP_MC, 0, 0)) + mce_panic("Failed kernel mode recovery", &m, msg); + } + + if (!mce_usable_address(&m)) + queue_task_work(&m, msg, kill_me_now); + else + queue_task_work(&m, msg, kill_me_maybe); } else { /* * Handle an MCE which has happened in kernel space but from diff --git a/fs/exec.c b/fs/exec.c index cf1df7f16e55..1bea9c252a11 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -67,6 +67,7 @@ #include <linux/time_namespace.h> #include <linux/user_events.h> #include <linux/rseq.h> +#include <linux/task_work.h> #include <linux/uaccess.h> #include <asm/mmu_context.h> @@ -1888,10 +1889,15 @@ static int bprm_execve(struct linux_binprm *bprm) * If past the point of no return ensure the code never * returns to the userspace process. Use an existing fatal * signal if present otherwise terminate the process with - * SIGSEGV. + * SIGSEGV. Run pending work before that in case it is + * terminating the process with a different signal. */ - if (bprm->point_of_no_return && !fatal_signal_pending(current)) - force_fatal_sig(SIGSEGV); + if (bprm->point_of_no_return) { + task_work_run(); + + if (!fatal_signal_pending(current)) + force_fatal_sig(SIGSEGV); + } sched_mm_cid_after_execve(current); current->fs->in_exec = 0; diff --git a/include/linux/sched.h b/include/linux/sched.h index 3c2abbc587b4..8970a191d8fe 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -922,7 +922,7 @@ struct task_struct { unsigned sched_rt_mutex:1; #endif - /* Bit to tell TOMOYO we're in execve(): */ + /* Bit to tell TOMOYO and x86 MCE code we're in execve(): */ unsigned in_execve:1; unsigned in_iowait:1; #ifndef TIF_RESTORE_SIGMASK -- 2.39.3