Re: [RFC PATCH bpf-next 0/3] bpf: freeze a task cgroup from bpf

Yonghong Song <yonghong.song@xxxxxxxxx> · Wed, 10 Apr 2024 17:26:18 -0700

On 4/9/24 8:32 AM, Michal Koutný wrote:
Hi.

On Tue, Apr 02, 2024 at 07:20:45PM +0100, Djalal Harouni <tixxdz@xxxxxxxxx> wrote:
Thanks yes, I would expect freeze to behave like signal, and if one
wants to block immediately there is the LSM override return. The
selftest attached tries to do exactly that.
Are you refering to this part:

	int BPF_PROG(lsm_freeze_cgroup, int cmd, union bpf_attr *attr, unsigned int size)
		...
		ret = bpf_task_freeze_cgroup(task, 1);
		if (!ret) {
			ret = -EPERM;
			/* reset for next call */
?

Could be security signals, reading sensitive files or related to any
operation management, for X reasons this user session should be freezed
or killed.
What can be done with a frozen cgroup after anything of that happens?
Anything besides killing anyway?

Killing of an offending process could be caught by its supervisor (like
container runtime or systemd) and propagated accordingly to the whole
cgroup.

The kill is an effective defense against fork-bombs as an example.
There are several ways how to prevent fork-bombs in kernel already, it
looks like a contrived example.

Today some container/pod operations are performed at bpf level, having
the freeze and kill available is straightforward to perform this.
It seems to me like an extra step when the same operation can be done from
(the managing) userspace already.

For generalizing this, haven't thought about it that much. First use
case is to try to get freeze and possibly kill support, and use a common
interface as requested.
BTW, I notice that there is bpf_sys_bpf() helper that allows calling an
arbitrary syscall. Wouldn't that be sufficient for everything?

This is not true. Currently, only 'bpf' and 'close' syscalls are supported.

static const struct bpf_func_proto *
syscall_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
        switch (func_id) {
        case BPF_FUNC_sys_bpf:
                return !bpf_token_capable(prog->aux->token, CAP_PERFMON)
                       ? NULL : &bpf_sys_bpf_proto;
        case BPF_FUNC_btf_find_by_name_kind:
                return &bpf_btf_find_by_name_kind_proto;
        case BPF_FUNC_sys_close:
                return &bpf_sys_close_proto;
        case BPF_FUNC_kallsyms_lookup_name:
                return &bpf_kallsyms_lookup_name_proto;
        default:
                return tracing_prog_func_proto(func_id, prog);
        }
}

More syscalls can be added (through kfunc) if there is a use case for that.

(Based on how I still understand the problem: either you must respond
immediately and then the direct return from LSM is appropriate or timing
is not sensitive but you want act on whole cgroup.)

Thanks,
Michal