Re: [PATCH v5 7/8] Documentation: Add documentation for the Brute LSM

John Wood <john.wood@xxxxxxx> · Tue, 9 Mar 2021 19:40:54 +0100

Hi,

On Sun, Mar 07, 2021 at 02:49:27PM -0800, Andi Kleen wrote:
> On Sun, Mar 07, 2021 at 07:05:41PM +0100, John Wood wrote:
> > On Sun, Mar 07, 2021 at 09:25:40AM -0800, Andi Kleen wrote:
> > > > processes created from it will be killed. If the systemd restart the network
> > > > daemon and it will crash again, then the systemd will be killed. I think this
> > > > way the attack is fully mitigated.
> > >
> > > Wouldn't that panic the system? Killing init is usually a panic.
> >
> > The mitigation acts only over the process that crashes (network daemon) and the
> > process that exec() it (systemd). This mitigation don't go up in the processes
> > tree until reach the init process.
>
> Most daemons have some supervisor that respawns them when they crash.
> (maybe read up on "supervisor trees" if you haven't, it's a standard concept)
>
> That's usually (but not) always init, as in systemd. There might be something
> inbetween it and init, but likely init would respawn the something in between
> it it. One of the main tasks of init is to respawn things under it.
>
> If you have a supervisor tree starting from init the kill should eventually
> travel up to init.

I will try to demostrate that the mitigation don't travel up to init. To do so I
will use the following scenario (brute force attack through the execve system
call):

init -------exec()-------> supervisor -------exec()-----> network daemon
faults = 0                 faults = 0                     faults = 0
period = ---               period = ---                   period = ---

Now the network daemon crashes (its stats an updated and also the supervisor
stats):

init --------------------> supervisor ------------------> network daemon
faults = 0                 faults = 1                     faults = 1
period = ---               period = 10ms                  period = 10ms

Then the network daemon is freed and its stats are removed:

init --------------------> supervisor
faults = 0                 faults = 1
period = ---               period = 10ms

Now the supervisor respawns the daemon (the stats are initialized):

init --------------------> supervisor ------------------> network daemon
faults = 0                 faults = 1                     faults = 0
period = ---               period = 10ms                  period = ---

The network daemon crashes again:

init --------------------> supervisor ------------------> network daemon
faults = 0                 faults = 2                     faults = 1
period = ---               period = 11ms                  period = 12ms

The network daemon is freed again:

init --------------------> supervisor
faults = 0                 faults = 2
period = ---               period = 11ms

The supervisor respawns again the daemon:

init --------------------> supervisor ------------------> network daemon
faults = 0                 faults = 2                     faults = 0
period = ---               period = 11ms                  period = ---

This steps are repeated x number of times until a minimum number of faults
triggers the brute force attack mitigation. At this moment:

init --------------------> supervisor ------------------> network daemon
faults = 0                 faults = 5                     faults = 1
period = ---               period = 13ms                  period = 15ms

Now the network daemon is freed and the supervisor is killed by the mitigation
method. At this point is importart to note that before send the kill signal to
the supervisor its stats are disabled. This means that when the supervisor is
killed its stats are now not updated. So the init stats are also not updated.

init
faults = 0
period = ---

>From the point of view of the init process nothing has happened.

> At least that's the theory. Do you have some experiments that show
> this doesn't happen?

Yes. The kernel selftest try to emulate some scenarios. Basically brute force
attacks through the execve system call (like the case exposed) and also brute
force attacks through the fork system call. Playing with the crossing of some
privilege boundaries.

For example:

In the tests an application execs() another application that crashes. Then
respawn the application that has crashed and this last crashes again. The
respawn is executed until the brute force attack through the execve system call
and then the application that execs() is killed. But any other applications are
killed. Only the tasks involved in the attack.
>
> >
> > Note: I am a kernel newbie and I don't know if the systemd is init. Sorry if it
> > is a stupid question. AFAIK systemd is not the init process (the first process
> > that is executed) but I am not sure.
>
> At least the part of systemd that respawns is often (but not always) init.

Thanks for the clarification.

> > So, you suggest that the mitigation method for the brute force attack through
> > the execve system call should be different (not kill the process that exec).
> > Any suggestions would be welcome to improve this feature.
>
> If the system is part of some cluster, then panicing on attack or failure
> could be a reasonable reaction. Some other system in the cluster should
> take over. There's also a risk that all the systems get taken
> out quickly one by one, in this case you might still need something
> like the below.
>
> But it's something that would need to be very carefully considered
> for the environment.
>
> The other case is when there isn't some fallback, as in a standalone
> machine.
>
> It could be only used when the supervisor daemons are aware of it.
> Often they already have respawn limits, but would need to make sure they
> trigger before your algorithm trigger. Or maybe some way to opt-out
> per process.  Then the DoS would be only against that process, but
> not everything on the machine.

Thanks for the suggestions.

> So I think it needs more work on the user space side for most usages.
>

Anyway, in the case that the supervisor is init then the system will panic. So,
I think that we can add a prctl to avoid kill the parent task (the task that
exec) and only block new fork system calls from this task. When this boolean is
set, any parent task that is involved in the attack will not be killed. In this
case, any following forks will be blocked. This way the system will not crash.

What do you think?

> -Andi

Thanks for your time and patience.
John Wood