Re: [PATCH v2 0/2] Introduce the pkill_on_warn parameter

Alexander Popov <alex.popov@xxxxxxxxx> · Tue, 16 Nov 2021 11:34:17 +0300

On 16.11.2021 09:37, Christophe Leroy wrote:
Le 15/11/2021 à 17:06, Steven Rostedt a écrit :
On Mon, 15 Nov 2021 14:59:57 +0100
Lukas Bulwahn <lukas.bulwahn@xxxxxxxxx> wrote:

1. Allow a reasonably configured kernel to boot and run with
panic_on_warn set. Warnings should only be raised when something is
not configured as the developers expect it or the kernel is put into a
state that generally is _unexpected_ and has been exposed little to
the critical thought of the developer, to testing efforts and use in
other systems in the wild. Warnings should not be used for something
informative, which still allows the kernel to continue running in a
proper way in a generally expected environment. Up to my knowledge,
there are some kernels in production that run with panic_on_warn; so,
IMHO, this requirement is generally accepted (we might of course

To me, WARN*() is the same as BUG*(). If it gets hit, it's a bug in the
kernel and needs to be fixed. I have several WARN*() calls in my code, and
it's all because the algorithms used is expected to prevent the condition
in the warning from happening. If the warning triggers, it means either that
the algorithm is wrong or my assumption about the algorithm is wrong. In
either case, the kernel needs to be updated. All my tests fail if a WARN*()
gets hit (anywhere in the kernel, not just my own).

After reading all the replies and thinking about this more, I find the
pkill_on_warning actually worse than not doing anything. If you are
concerned about exploits from warnings, the only real solution is a
panic_on_warning. Yes, it brings down the system, but really, it has to be
brought down anyway, because it is in need of a kernel update.

We also have LIVEPATCH to avoid bringing down the system for a kernel
update, don't we ? So I wouldn't expect bringing down a vital system
just for a WARN.

Hello Christophe,

I would say that different systems have different requirements.
Not every Linux-based system needs live patching (it also has own limitations).

That's why I proposed a sysctl and didn't change the default kernel behavior.

As far as I understand from
https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on,
WARN() and WARN_ON() are meant to deal with those situations as
gracefull as possible, allowing the system to continue running the best
it can until a human controled action is taken.

I can't agree here. There is a very strong push against adding BUG*() to the 
kernel source code. So there are a lot of cases when WARN*() is used for severe 
problems because kernel developers just don't have other options.

Currently, it looks like there is no consistent error handling policy in the kernel.

So I'd expect the WARN/WARN_ON to be handled and I agree that that
pkill_on_warning seems dangerous and unrelevant, probably more dangerous
than doing nothing, especially as the WARN may trigger for a reason
which has nothing to do with the running thread.

Sorry, I see a contradiction.
If killing a process hitting a kernel warning is "dangerous and unrelevant",
why killing a process on a kernel oops is fine? That's strange.

Linus calls that behavior "fairly benign" here: 
http://lkml.iu.edu/hypermail/linux/kernel/1610.0/01217.html

Best regards,
Alexander