Re: [tip: x86/entry] x86/entry: Treat BUG/WARN as NMI-like entries

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Tue, 16 Jun 2020 00:23:30 +0200

On Mon, Jun 15, 2020 at 02:08:16PM -0700, Andy Lutomirski wrote:

> > All !user exceptions really should be NMI-like. If you want to go
> > overboard, I suppose you can look at IF and have them behave interrupt
> > like when set, but why make things complicated.
> 
> This entire rabbit hole opened because of #PF. So we at least need the
> set of exceptions that are permitted to schedule if they came from
> kernel mode to remain schedulable.

What exception, other than #PF, actually needs to schedule from kernel?

> Prior to the giant changes, all the non-IST *exceptions*, but not the
> interrupts, were schedulable from kernel mode, assuming the original
> context could schedule. Right now, interrupts can schedule, too, which
> is nice if we ever want to fully clean up the Xen abomination. I
> suppose we could make it so #PF opts in to special treatment again,
> but we should decide that the result is simpler or otherwise better
> before we do this.
> 
> One possible justification would be that the schedulable entry variant
> is more complicated, and most kernel exceptions except the ones with
> fixups are bad news, and we want the oopses to succeed. But page
> faults are probably the most common source of oopses, so this is a bit
> weak, and we really want page faults to work even from nasty contexts.

I think I'd prefer the argument of consistent failure.

Do we ever want #UD to schedule? If not, then why allow it to sometimes
schedule and sometimes fail, better to always fail.

#DB is still a giant trainwreck in this regard as well.

Something like this...

--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -216,10 +216,25 @@ static inline void handle_invalid_op(str
 		      ILL_ILLOPN, error_get_trap_addr(regs));
 }
 
-DEFINE_IDTENTRY_RAW(exc_invalid_op)
+static void handle_invalid_op_kernel(struct pt_regs *regs)
+{
+	if (is_valid_bugaddr(regs->ip) &&
+	    report_bug(regs->ip, regs) == BUG_TRAP_TYPE_WARN) {
+		/* Skip the ud2. */
+		regs->ip += LEN_UD2;
+		return;
+	}
+
+	handle_invalid_op(regs);
+}
+
+static void handle_invalid_op_user(struct pt_regs *regs)
 {
-	bool rcu_exit;
+	handle_invalid_op(regs);
+}
 
+DEFINE_IDTENTRY_RAW(exc_invalid_op)
+{
 	/*
 	 * Handle BUG/WARN like NMIs instead of like normal idtentries:
 	 * if we bugged/warned in a bad RCU context, for example, the last
@@ -227,38 +242,25 @@ DEFINE_IDTENTRY_RAW(exc_invalid_op)
 	 * infinitum.
 	 */
 	if (!user_mode(regs)) {
-		enum bug_trap_type type = BUG_TRAP_TYPE_NONE;
-
 		nmi_enter();
 		instrumentation_begin();
 		trace_hardirqs_off_finish();
 
-		if (is_valid_bugaddr(regs->ip))
-			type = report_bug(regs->ip, regs);
+		handle_invalid_op_kernel(regs);
 
 		if (regs->flags & X86_EFLAGS_IF)
 			trace_hardirqs_on_prepare();
 		instrumentation_end();
 		nmi_exit();
+	} else {
+		bool rcu_exit;
 
-		if (type == BUG_TRAP_TYPE_WARN) {
-			/* Skip the ud2. */
-			regs->ip += LEN_UD2;
-			return;
-		}
-
-		/*
-		 * Else, if this was a BUG and report_bug returns or if this
-		 * was just a normal #UD, we want to continue onward and
-		 * crash.
-		 */
+		rcu_exit = idtentry_enter_cond_rcu(regs);
+		instrumentation_begin();
+		handle_invalid_op_user(regs);
+		instrumentation_end();
+		idtentry_exit_cond_rcu(regs, rcu_exit);
 	}
-
-	rcu_exit = idtentry_enter_cond_rcu(regs);
-	instrumentation_begin();
-	handle_invalid_op(regs);
-	instrumentation_end();
-	idtentry_exit_cond_rcu(regs, rcu_exit);
 }
 
 DEFINE_IDTENTRY(exc_coproc_segment_overrun)