"Andy Lutomirski" <luto@xxxxxxxxxx> writes: > On Wed, Oct 20, 2021, at 10:43 AM, Eric W. Biederman wrote: >> Instead of pretending to send SIGSEGV by calling do_exit(SIGSEGV) >> call force_sigsegv(SIGSEGV) to force the process to take a SIGSEGV >> and terminate. > > Why? I realize it's more polite, but is this useful enough to justify > the need for testing and potential security impacts? The why is that do_exit as an interface needs to be refactored. As it exists right now "do_exit" is bad enough that on a couple of older architectures do_exit in a random location results in being able to read/write the kernel stack using ptrace. So to addresses the issues I need to get everything that really shouldn't be using do_exit to use something else. >> Update handle_signal to return immediately when save_v86_state fails >> and kills the process. Returning immediately without doing anything >> except killing the process with SIGSEGV is also what signal_setup_done >> does when setup_rt_frame fails. Plus it is always ok to return >> immediately without delivering a signal to a userspace handler when a >> fatal signal has killed the current process. >> > > I can mostly understand the individual sentences, but I don't > understand what you're getting it. If a fatal signal has killed the > current process and we are guaranteed not to hit the exit-to-usermode > path, then, sure, it's safe to return unless we're worried that the > core dump code will explode. > > But, unless it's fixed elsewhere in your series, force_sigsegv() is > itself quite racy, or at least looks racy -- it can race against > another thread calling sigaction() and changing the action to > something other than SIG_DFL. So it does not appear to actually > reliably kill the caller, especially if exposed to a malicious user > program. You are correct about the races. I have changes in the works to make the races go away but that is not an excuse for push a change that is buggy without them. >> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> >> Cc: Ingo Molnar <mingo@xxxxxxxxxx> >> Cc: Borislav Petkov <bp@xxxxxxxxx> >> Cc: x86@xxxxxxxxxx >> Cc: H Peter Anvin <hpa@xxxxxxxxx> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> >> --- >> arch/x86/kernel/signal.c | 6 +++++- >> arch/x86/kernel/vm86_32.c | 2 +- >> 2 files changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c >> index f4d21e470083..25a230f705c1 100644 >> --- a/arch/x86/kernel/signal.c >> +++ b/arch/x86/kernel/signal.c >> @@ -785,8 +785,12 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs) >> bool stepping, failed; >> struct fpu *fpu = ¤t->thread.fpu; >> >> - if (v8086_mode(regs)) >> + if (v8086_mode(regs)) { >> save_v86_state((struct kernel_vm86_regs *) regs, VM86_SIGNAL); >> + /* Has save_v86_state failed and killed the process? */ >> + if (fatal_signal_pending(current)) >> + return; > > This might be an ABI break, or at least it could be if anyone cared > about vm86. Imagine this wasn't guarded by if (v8086_mode) and was > just if (fatal_signal_pending(current)) return; Then all the other > processing gets skipped if a fatal signal is pending (e.g. from a > concurrent kill), which could cause visible oddities in a core dump, I > think. Maybe it's minor. I believe it is minor, because the test happens before anything is written to userspace. The worst case is a signal gets dequeued and then not written to userspace. On a second I am not certain this test is even necessary. Especially if the change you suggest be made to save_v86_state is made so that the kernel is out of v86 state and kernel things can safely happen. >> + } >> >> /* Are we from a system call? */ >> if (syscall_get_nr(current, regs) != -1) { >> diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c >> index 63486da77272..040fd01be8b3 100644 >> --- a/arch/x86/kernel/vm86_32.c >> +++ b/arch/x86/kernel/vm86_32.c >> @@ -159,7 +159,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, >> int retval) >> user_access_end(); >> Efault: >> pr_alert("could not access userspace vm86 info\n"); >> - do_exit(SIGSEGV); >> + force_sigsegv(SIGSEGV); > > This causes us to run unwitting kernel code with the vm86 garbage > still loaded into the relevant architectural areas (see the chunk if > save_v86_state that's inside preempt_disable()). So NAK, especially > since the aforementioned race might cause the exit-to-usermode path to > actually run with who-knows-what consequences. Fair. I suspect it might even make the current do_exit call run with who-knows-what consequence. > If you really want to make this change, please arrange for > save_v86_state() to switch out of vm86 mode *before* anything that > might fail so that it's guaranteed to at least put the task in a sane > state. And write an explicit test case that tests it. I could help > with the latter if you do the former. I do really want to remove this do_exit. If the error was causes by a kernel malfunction we could do something like die. As it is the code is effectively hand rolling die/oops for a userspace caused condition. Which is quite nasty from a maintenance point of view. I think your suggested changes to save_v86_state are much more robust than my idea of simply calling force_sig... and expecting the kernel to exit immediately. Having to go another pass through the exit_to_usermode_loop does not look like it is very hard to make it robust against a kernel in a random state. I could close the race today by replacing the force_sigsegv(SIGSEGV) with force_sig(SIGKILL). And that removes the coredump path from the equation so is a bit interesting, but it really is unsatisfactory. I will dig in and see what can be done including writing a test so that this code path gracefully handles -EFAULT rather than tries to walk through the rest of the kernel in a problematic state. This change as proposed does not get this save_v86_state case to using ordinary mechanisms to handle the problem, so as written it does not solve the problem it set out to solve. Eric