Re: [PATCH] Introduce the pkill_on_warn boot parameter

Petr Mladek <pmladek@xxxxxxxx> · Thu, 30 Sep 2021 11:15:41 +0200

On Wed 2021-09-29 12:49:24, Paul E. McKenney wrote:
> On Wed, Sep 29, 2021 at 10:01:33PM +0300, Alexander Popov wrote:
> > On 29.09.2021 21:58, Alexander Popov wrote:
> > > Currently, the Linux kernel provides two types of reaction to kernel
> > > warnings:
> > >  1. Do nothing (by default),
> > >  2. Call panic() if panic_on_warn is set. That's a very strong reaction,
> > >     so panic_on_warn is usually disabled on production systems.

Honestly, I am not sure if panic_on_warn() or the new pkill_on_warn()
work as expected. I wonder who uses it in practice and what is
the experience.

The problem is that many developers do not know about this behavior.
They use WARN() when they are lazy to write more useful message or when
they want to see all the provided details: task, registry, backtrace.

Also it is inconsistent with pr_warn() behavior. Why a single line
warning would be innocent and full info WARN() cause panic/pkill?

What about pr_err(), pr_crit(), pr_alert(), pr_emerg()? They inform
about even more serious problems. Why a warning should cause panic/pkill
while an alert message is just printed?

It somehow reminds me the saga with %pK. We were not able to teach
developers to use it correctly for years and ended with hashed
pointers.

Well, this might be different. Developers might learn this the hard
way from bug reports. But there will be bug reports only when
anyone really enables this behavior. They will enable it only
when it works the right way most of the time.

> > > From a safety point of view, the Linux kernel misses a middle way of
> > > handling kernel warnings:
> > >  - The kernel should stop the activity that provokes a warning,
> > >  - But the kernel should avoid complete denial of service.
> > > 
> > > From a security point of view, kernel warning messages provide a lot of
> > > useful information for attackers. Many GNU/Linux distributions allow
> > > unprivileged users to read the kernel log, so attackers use kernel
> > > warning infoleak in vulnerability exploits. See the examples:
> > >   https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
> > >   https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html
> > > 
> > > Let's introduce the pkill_on_warn boot parameter.
> > > If this parameter is set, the kernel kills all threads in a process
> > > that provoked a kernel warning. This behavior is reasonable from a safety
> > > point of view described above. It is also useful for kernel security
> > > hardening because the system kills an exploit process that hits a
> > > kernel warning.
> > > 
> > > Signed-off-by: Alexander Popov <alex.popov@xxxxxxxxx>
> > 
> > This patch was tested using CONFIG_LKDTM.
> > The kernel kills a process that performs this:
> >   echo WARNING > /sys/kernel/debug/provoke-crash/DIRECT
> > 
> > If you are fine with this approach, I will prepare a patch adding the
> > pkill_on_warn sysctl.
> 
> I suspect that you need a list of kthreads for which you are better
> off just invoking panic().  RCU's various kthreads, for but one set
> of examples.

I wonder if kernel could survive killing of any kthread. I have never
seen a code that would check whether a kthread was killed and
restart it.

Best Regards,
Petr