Re: [PATCH v2 0/2] Introduce the pkill_on_warn parameter

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Tue, 16 Nov 2021 08:07:19 -0500

On Mon, 2021-11-15 at 14:06 -0800, Kees Cook wrote:
> On Mon, Nov 15, 2021 at 11:06:49AM -0500, Steven Rostedt wrote:
> > On Mon, 15 Nov 2021 14:59:57 +0100
> > Lukas Bulwahn <lukas.bulwahn@xxxxxxxxx> wrote:
> > 
> > > 1. Allow a reasonably configured kernel to boot and run with
> > > panic_on_warn set. Warnings should only be raised when something
> > > is not configured as the developers expect it or the kernel is
> > > put into a state that generally is _unexpected_ and has been
> > > exposed little to the critical thought of the developer, to
> > > testing efforts and use in other systems in the wild. Warnings
> > > should not be used for something informative, which still allows
> > > the kernel to continue running in a proper way in a generally
> > > expected environment. Up to my knowledge, there are some kernels
> > > in production that run with panic_on_warn; so, IMHO, this
> > > requirement is generally accepted (we might of course
> > 
> > To me, WARN*() is the same as BUG*(). If it gets hit, it's a bug in
> > the kernel and needs to be fixed. I have several WARN*() calls in
> > my code, and it's all because the algorithms used is expected to
> > prevent the condition in the warning from happening. If the warning
> > triggers, it means either that the algorithm is wrong or my
> > assumption about the algorithm is wrong. In either case, the kernel
> > needs to be updated. All my tests fail if a WARN*() gets hit
> > (anywhere in the kernel, not just my own).
> > 
> > After reading all the replies and thinking about this more, I find
> > the pkill_on_warning actually worse than not doing anything. If you
> > are concerned about exploits from warnings, the only real solution
> > is a panic_on_warning. Yes, it brings down the system, but really,
> > it has to be brought down anyway, because it is in need of a kernel
> > update.
> 
> Hmm, yes. What it originally boiled down to, which is why Linus first
> objected to BUG(), was that we don't know what other parts of the
> system have been disrupted. The best example is just that of locking:
> if we BUG() or do_exit() in the middle of holding a lock, we'll wreck
> whatever subsystem that was attached to. Without a deterministic
> system state unwinder, there really isn't a "safe" way to just stop a
> kernel thread.

But this misses the real point: the majority of WARN conditions are in
device drivers checking expected device state against an internal state
model.  If this triggers it's a problem with the device not the thread,
so killing the thread is blaming the wrong party and making the
situation worse because it didn't do anything to address the actual
problem.

> With this pkill_on_warn, we avoid the BUG problem (since the thread
> of execution continues and stops at an 'expected' place: the signal
> handler).

And what about the unexpected state?

> However, now we have the newer objection from Linus, which is one of
> attribution: the WARN might be hit during an "unrelated" thread of
> execution and "current" gets blamed, etc. And beyond that, if we take
> down a portion of userspace, what in userspace may be destabilized?
> In theory, we get a case where any required daemons would be
> restarted by init, but that's not "known".
> 
> The safest version of this I can think of is for processes to opt
> into this mitigation. That would also cover the "special cases" we've
> seen exposed too. i.e. init and kthreads would not opt in.
> 
> However, that's a lot to implement when Marco's tracing suggestion
> might be sufficient and policy could be entirely implemented in
> userspace. It could be as simple as this (totally untested):

Really, no, this is precisely wrong thinking.  If the condition were
recoverable it wouldn't result in a WARN.  There are some WARNs where
we think the condition is unexpected enough not to bother adding error
handling (we need these reporting so we know that the assumption was
wrong), but for most if there were a way to handle it we'd have built
it into the usual error flow.  What WARN means is that an unexpected
condition occurred which means the kernel itself is in an unknown
state.  You can't recover from that by killing and restarting random
stuff, you have to reinitialize to a known state (i.e. reset the
system).  Some of the reason we do WARN instead of BUG is that we
believe the state contamination is limited and if you're careful the
system can continue in a degraded state if the user wants to accept the
risk.  Thinking the user can handle the state reset locally by some
preset policy is pure fantasy: if we didn't know how to fix it at the
point it occurred, why would something far away from the action when
most of the information has been lost have a better chance?

Your only policy choices when hitting WARN are

   1. Accept the risk and continue degraded operation, or
   2. reset the system to a known good state.

James