Re: [PATCH v2 0/2] Introduce the pkill_on_warn parameter

Kees Cook <keescook@xxxxxxxxxxxx> · Mon, 15 Nov 2021 14:06:41 -0800

On Mon, Nov 15, 2021 at 11:06:49AM -0500, Steven Rostedt wrote:
> On Mon, 15 Nov 2021 14:59:57 +0100
> Lukas Bulwahn <lukas.bulwahn@xxxxxxxxx> wrote:
> 
> > 1. Allow a reasonably configured kernel to boot and run with
> > panic_on_warn set. Warnings should only be raised when something is
> > not configured as the developers expect it or the kernel is put into a
> > state that generally is _unexpected_ and has been exposed little to
> > the critical thought of the developer, to testing efforts and use in
> > other systems in the wild. Warnings should not be used for something
> > informative, which still allows the kernel to continue running in a
> > proper way in a generally expected environment. Up to my knowledge,
> > there are some kernels in production that run with panic_on_warn; so,
> > IMHO, this requirement is generally accepted (we might of course
> 
> To me, WARN*() is the same as BUG*(). If it gets hit, it's a bug in the
> kernel and needs to be fixed. I have several WARN*() calls in my code, and
> it's all because the algorithms used is expected to prevent the condition
> in the warning from happening. If the warning triggers, it means either that
> the algorithm is wrong or my assumption about the algorithm is wrong. In
> either case, the kernel needs to be updated. All my tests fail if a WARN*()
> gets hit (anywhere in the kernel, not just my own).
> 
> After reading all the replies and thinking about this more, I find the
> pkill_on_warning actually worse than not doing anything. If you are
> concerned about exploits from warnings, the only real solution is a
> panic_on_warning. Yes, it brings down the system, but really, it has to be
> brought down anyway, because it is in need of a kernel update.

Hmm, yes. What it originally boiled down to, which is why Linus first
objected to BUG(), was that we don't know what other parts of the system
have been disrupted. The best example is just that of locking: if we
BUG() or do_exit() in the middle of holding a lock, we'll wreck whatever
subsystem that was attached to. Without a deterministic system state
unwinder, there really isn't a "safe" way to just stop a kernel thread.

With this pkill_on_warn, we avoid the BUG problem (since the thread of
execution continues and stops at an 'expected' place: the signal
handler).

However, now we have the newer objection from Linus, which is one of
attribution: the WARN might be hit during an "unrelated" thread of
execution and "current" gets blamed, etc. And beyond that, if we take
down a portion of userspace, what in userspace may be destabilized? In
theory, we get a case where any required daemons would be restarted by
init, but that's not "known".

The safest version of this I can think of is for processes to opt into
this mitigation. That would also cover the "special cases" we've seen
exposed too. i.e. init and kthreads would not opt in.

However, that's a lot to implement when Marco's tracing suggestion might
be sufficient and policy could be entirely implemented in userspace. It
could be as simple as this (totally untested):

diff --git a/include/trace/events/error_report.h b/include/trace/events/error_report.h
index 96f64bf218b2..129d22eb8b6e 100644
--- a/include/trace/events/error_report.h
+++ b/include/trace/events/error_report.h
@@ -16,6 +16,8 @@
 #define __ERROR_REPORT_DECLARE_TRACE_ENUMS_ONCE_ONLY
 
 enum error_detector {
+	ERROR_DETECTOR_WARN,
+	ERROR_DETECTOR_BUG,
 	ERROR_DETECTOR_KFENCE,
 	ERROR_DETECTOR_KASAN
 };
@@ -23,6 +25,8 @@ enum error_detector {
 #endif /* __ERROR_REPORT_DECLARE_TRACE_ENUMS_ONCE_ONLY */
 
 #define error_detector_list	\
+	EM(ERROR_DETECTOR_WARN, "warn")	\
+	EM(ERROR_DETECTOR_BUG, "bug")	\
 	EM(ERROR_DETECTOR_KFENCE, "kfence")	\
 	EMe(ERROR_DETECTOR_KASAN, "kasan")
 /* Always end the list with an EMe. */
diff --git a/lib/bug.c b/lib/bug.c
index 45a0584f6541..201b4070bbbc 100644
--- a/lib/bug.c
+++ b/lib/bug.c
@@ -48,6 +48,7 @@
 #include <linux/sched.h>
 #include <linux/rculist.h>
 #include <linux/ftrace.h>
+#include <trace/events/error_report.h>
 
 extern struct bug_entry __start___bug_table[], __stop___bug_table[];
 
@@ -198,6 +199,7 @@ enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs)
 		/* this is a WARN_ON rather than BUG/BUG_ON */
 		__warn(file, line, (void *)bugaddr, BUG_GET_TAINT(bug), regs,
 		       NULL);
+		trace_error_report_end(ERROR_DETECTOR_WARN, bugaddr);
 		return BUG_TRAP_TYPE_WARN;
 	}
 
@@ -206,6 +208,7 @@ enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs)
 	else
 		pr_crit("Kernel BUG at %pB [verbose debug info unavailable]\n",
 			(void *)bugaddr);
+	trace_error_report_end(ERROR_DETECTOR_BUG, bugaddr);
 
 	return BUG_TRAP_TYPE_BUG;
 }


Marco, is this the full version of monitoring this from the userspace
side?

	perf record -e error_report:error_report_end

-- 
Kees Cook