On 13.11.2021 00:26, Linus Torvalds wrote:
On Fri, Nov 12, 2021 at 10:52 AM Alexander Popov <alex.popov@xxxxxxxxx> wrote:
Hello everyone!
Friendly ping for your feedback.
I still haven't heard a compelling _reason_ for this all, and why
anybody should ever use this or care?
Ok, to sum up:
Killing the process that hit a kernel warning complies with the Fail-Fast
principle [1]. pkill_on_warn sysctl allows the kernel to stop the process when
the **first signs** of wrong behavior are detected.
By default, the Linux kernel ignores a warning and proceeds the execution from
the flawed state. That is opposite to the Fail-Fast principle.
A kernel warning may be followed by memory corruption or other negative effects,
like in CVE-2019-18683 exploit [2] or many other cases detected by the SyzScope
project [3]. pkill_on_warn would prevent the system from the errors going after
a warning in the process context.
At the same time, pkill_on_warn does not kill the entire system like
panic_on_warn. That is the middle way of handling kernel warnings.
Linus, it's similar to your BUG_ON() policy [4]. The process hitting BUG_ON() is
killed, and the system proceeds to work. pkill_on_warn just brings a similar
policy to WARN_ON() handling.
I believe that many Linux distros (which don't hit WARN_ON() here and there)
will enable pkill_on_warn because it's reasonable from the safety and security
points of view.
And I'm sure that the ELISA project by the Linux Foundation (Enabling Linux In
Safety Applications [5]) would support the pkill_on_warn sysctl.
[Adding people from this project to CC]
I hope that I managed to show the rationale.
Best regards,
Alexander
[1]: https://en.wikipedia.org/wiki/Fail-fast
[2]: https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
[3]: https://www.usenix.org/system/files/sec22summer_zou.pdf
[4]: http://lkml.iu.edu/hypermail/linux/kernel/1610.0/01217.html
[5]: https://elisa.tech/