Re: [PATCH v2 0/2] Introduce the pkill_on_warn parameter

Alexander Popov <alex.popov@xxxxxxxxx> · Fri, 12 Nov 2021 21:52:42 +0300

On 28.10.2021 02:32, Alexander Popov wrote:
Hello! This is the v2 of pkill_on_warn.
Changes from v1 and tricks for testing are described below.

Hello everyone!
Friendly ping for your feedback.

Thanks.
Alexander

Rationale
=========

Currently, the Linux kernel provides two types of reaction to kernel
warnings:
  1. Do nothing (by default),
  2. Call panic() if panic_on_warn is set. That's a very strong reaction,
     so panic_on_warn is usually disabled on production systems.

 From a safety point of view, the Linux kernel misses a middle way of
handling kernel warnings:
  - The kernel should stop the activity that provokes a warning,
  - But the kernel should avoid complete denial of service.

 From a security point of view, kernel warning messages provide a lot of
useful information for attackers. Many GNU/Linux distributions allow
unprivileged users to read the kernel log, so attackers use kernel
warning infoleak in vulnerability exploits. See the examples:
https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html
https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html

Let's introduce the pkill_on_warn sysctl.
If this parameter is set, the kernel kills all threads in a process that
provoked a kernel warning. This behavior is reasonable from a safety point of
view described above. It is also useful for kernel security hardening because
the system kills an exploit process that hits a kernel warning.

Moreover, bugs usually don't come alone, and a kernel warning may be
followed by memory corruption or other bad effects. So pkill_on_warn allows
the kernel to stop the process when the first signs of wrong behavior
are detected.


Changes from v1
===============

1) Introduce do_pkill_on_warn() and call it in all warning handling paths.

2) Do refactoring without functional changes in a separate patch.

3) Avoid killing init and kthreads.

4) Use do_send_sig_info() instead of do_group_exit().

5) Introduce sysctl instead of using core_param().


Tricks for testing
==================

1) This patch series was tested on x86_64 using CONFIG_LKDTM.
The kernel kills a process that performs this:
   echo WARNING > /sys/kernel/debug/provoke-crash/DIRECT

2) The warn_slowpath_fmt() path was tested using this trick:

diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
index 84b87538a15d..3106c203ebb6 100644
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -73,7 +73,7 @@ do {                                                          \
   * were to trigger, we'd rather wreck the machine in an attempt to get the
   * message out than not know about it.
   */
-#define __WARN_FLAGS(flags)                                    \
+#define ___WARN_FLAGS(flags)                                   \
  do {                                                           \
         instrumentation_begin();                                \
         _BUG_FLAGS(ASM_UD2, BUGFLAG_WARNING|(flags));           \

3) Testing pkill_on_warn with kthreads was done using this trick:
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index bce848e50512..13c56f472681 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2133,6 +2133,8 @@ static int __noreturn rcu_gp_kthread(void *unused)
                 WRITE_ONCE(rcu_state.gp_state, RCU_GP_CLEANUP);
                 rcu_gp_cleanup();
                 WRITE_ONCE(rcu_state.gp_state, RCU_GP_CLEANED);
+
+               WARN_ONCE(1, "hello from kthread\n");
         }
  }

4) Changing drivers/misc/lkdtm/bugs.c:lkdtm_WARNING() allowed me
to test all warning flavours:
  - WARN_ON()
  - WARN()
  - WARN_TAINT()
  - WARN_ON_ONCE()
  - WARN_ONCE()
  - WARN_TAINT_ONCE()

Thanks!

Alexander Popov (2):
   bug: do refactoring allowing to add a warning handling action
   sysctl: introduce kernel.pkill_on_warn

  Documentation/admin-guide/sysctl/kernel.rst | 14 ++++++++
  include/asm-generic/bug.h                   | 37 +++++++++++++++------
  include/linux/panic.h                       |  3 ++
  kernel/panic.c                              | 22 +++++++++++-
  kernel/sysctl.c                             |  9 +++++
  lib/bug.c                                   | 22 ++++++++----
  6 files changed, 90 insertions(+), 17 deletions(-)