Hello! This is the v2 of pkill_on_warn. Changes from v1 and tricks for testing are described below. Rationale ========= Currently, the Linux kernel provides two types of reaction to kernel warnings: 1. Do nothing (by default), 2. Call panic() if panic_on_warn is set. That's a very strong reaction, so panic_on_warn is usually disabled on production systems. >From a safety point of view, the Linux kernel misses a middle way of handling kernel warnings: - The kernel should stop the activity that provokes a warning, - But the kernel should avoid complete denial of service. >From a security point of view, kernel warning messages provide a lot of useful information for attackers. Many GNU/Linux distributions allow unprivileged users to read the kernel log, so attackers use kernel warning infoleak in vulnerability exploits. See the examples: https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html Let's introduce the pkill_on_warn sysctl. If this parameter is set, the kernel kills all threads in a process that provoked a kernel warning. This behavior is reasonable from a safety point of view described above. It is also useful for kernel security hardening because the system kills an exploit process that hits a kernel warning. Moreover, bugs usually don't come alone, and a kernel warning may be followed by memory corruption or other bad effects. So pkill_on_warn allows the kernel to stop the process when the first signs of wrong behavior are detected. Changes from v1 =============== 1) Introduce do_pkill_on_warn() and call it in all warning handling paths. 2) Do refactoring without functional changes in a separate patch. 3) Avoid killing init and kthreads. 4) Use do_send_sig_info() instead of do_group_exit(). 5) Introduce sysctl instead of using core_param(). Tricks for testing ================== 1) This patch series was tested on x86_64 using CONFIG_LKDTM. The kernel kills a process that performs this: echo WARNING > /sys/kernel/debug/provoke-crash/DIRECT 2) The warn_slowpath_fmt() path was tested using this trick: diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h index 84b87538a15d..3106c203ebb6 100644 --- a/arch/x86/include/asm/bug.h +++ b/arch/x86/include/asm/bug.h @@ -73,7 +73,7 @@ do { \ * were to trigger, we'd rather wreck the machine in an attempt to get the * message out than not know about it. */ -#define __WARN_FLAGS(flags) \ +#define ___WARN_FLAGS(flags) \ do { \ instrumentation_begin(); \ _BUG_FLAGS(ASM_UD2, BUGFLAG_WARNING|(flags)); \ 3) Testing pkill_on_warn with kthreads was done using this trick: diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index bce848e50512..13c56f472681 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2133,6 +2133,8 @@ static int __noreturn rcu_gp_kthread(void *unused) WRITE_ONCE(rcu_state.gp_state, RCU_GP_CLEANUP); rcu_gp_cleanup(); WRITE_ONCE(rcu_state.gp_state, RCU_GP_CLEANED); + + WARN_ONCE(1, "hello from kthread\n"); } } 4) Changing drivers/misc/lkdtm/bugs.c:lkdtm_WARNING() allowed me to test all warning flavours: - WARN_ON() - WARN() - WARN_TAINT() - WARN_ON_ONCE() - WARN_ONCE() - WARN_TAINT_ONCE() Thanks! Alexander Popov (2): bug: do refactoring allowing to add a warning handling action sysctl: introduce kernel.pkill_on_warn Documentation/admin-guide/sysctl/kernel.rst | 14 ++++++++ include/asm-generic/bug.h | 37 +++++++++++++++------ include/linux/panic.h | 3 ++ kernel/panic.c | 22 +++++++++++- kernel/sysctl.c | 9 +++++ lib/bug.c | 22 ++++++++---- 6 files changed, 90 insertions(+), 17 deletions(-) -- 2.31.1