On Mon, Jun 15, 2020 at 02:08:16PM -0700, Andy Lutomirski wrote: > > All !user exceptions really should be NMI-like. If you want to go > > overboard, I suppose you can look at IF and have them behave interrupt > > like when set, but why make things complicated. > > This entire rabbit hole opened because of #PF. So we at least need the > set of exceptions that are permitted to schedule if they came from > kernel mode to remain schedulable. What exception, other than #PF, actually needs to schedule from kernel? > Prior to the giant changes, all the non-IST *exceptions*, but not the > interrupts, were schedulable from kernel mode, assuming the original > context could schedule. Right now, interrupts can schedule, too, which > is nice if we ever want to fully clean up the Xen abomination. I > suppose we could make it so #PF opts in to special treatment again, > but we should decide that the result is simpler or otherwise better > before we do this. > > One possible justification would be that the schedulable entry variant > is more complicated, and most kernel exceptions except the ones with > fixups are bad news, and we want the oopses to succeed. But page > faults are probably the most common source of oopses, so this is a bit > weak, and we really want page faults to work even from nasty contexts. I think I'd prefer the argument of consistent failure. Do we ever want #UD to schedule? If not, then why allow it to sometimes schedule and sometimes fail, better to always fail. #DB is still a giant trainwreck in this regard as well. Something like this... --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -216,10 +216,25 @@ static inline void handle_invalid_op(str ILL_ILLOPN, error_get_trap_addr(regs)); } -DEFINE_IDTENTRY_RAW(exc_invalid_op) +static void handle_invalid_op_kernel(struct pt_regs *regs) +{ + if (is_valid_bugaddr(regs->ip) && + report_bug(regs->ip, regs) == BUG_TRAP_TYPE_WARN) { + /* Skip the ud2. */ + regs->ip += LEN_UD2; + return; + } + + handle_invalid_op(regs); +} + +static void handle_invalid_op_user(struct pt_regs *regs) { - bool rcu_exit; + handle_invalid_op(regs); +} +DEFINE_IDTENTRY_RAW(exc_invalid_op) +{ /* * Handle BUG/WARN like NMIs instead of like normal idtentries: * if we bugged/warned in a bad RCU context, for example, the last @@ -227,38 +242,25 @@ DEFINE_IDTENTRY_RAW(exc_invalid_op) * infinitum. */ if (!user_mode(regs)) { - enum bug_trap_type type = BUG_TRAP_TYPE_NONE; - nmi_enter(); instrumentation_begin(); trace_hardirqs_off_finish(); - if (is_valid_bugaddr(regs->ip)) - type = report_bug(regs->ip, regs); + handle_invalid_op_kernel(regs); if (regs->flags & X86_EFLAGS_IF) trace_hardirqs_on_prepare(); instrumentation_end(); nmi_exit(); + } else { + bool rcu_exit; - if (type == BUG_TRAP_TYPE_WARN) { - /* Skip the ud2. */ - regs->ip += LEN_UD2; - return; - } - - /* - * Else, if this was a BUG and report_bug returns or if this - * was just a normal #UD, we want to continue onward and - * crash. - */ + rcu_exit = idtentry_enter_cond_rcu(regs); + instrumentation_begin(); + handle_invalid_op_user(regs); + instrumentation_end(); + idtentry_exit_cond_rcu(regs, rcu_exit); } - - rcu_exit = idtentry_enter_cond_rcu(regs); - instrumentation_begin(); - handle_invalid_op(regs); - instrumentation_end(); - idtentry_exit_cond_rcu(regs, rcu_exit); } DEFINE_IDTENTRY(exc_coproc_segment_overrun)