Re: [RFC bpf-next 1/4] bpf: cgroup_sock lsm flavor

sdf@xxxxxxxxxx · Thu, 17 Feb 2022 08:21:36 -0800

On 02/16, Alexei Starovoitov wrote:
On Tue, Feb 15, 2022 at 04:12:38PM -0800, Stanislav Fomichev wrote:
>  {
> @@ -1767,14 +1769,23 @@ static int invoke_bpf_prog(const struct  
btf_func_model *m, u8 **pprog,
>
>  	/* arg1: lea rdi, [rbp - stack_size] */
>  	EMIT4(0x48, 0x8D, 0x7D, -stack_size);
> -	/* arg2: progs[i]->insnsi for interpreter */
> -	if (!p->jited)
> -		emit_mov_imm64(&prog, BPF_REG_2,
> -			       (long) p->insnsi >> 32,
> -			       (u32) (long) p->insnsi);
> -	/* call JITed bpf program or interpreter */
> -	if (emit_call(&prog, p->bpf_func, prog))
> -		return -EINVAL;
> +
> +	if (p->expected_attach_type == BPF_LSM_CGROUP_SOCK) {
> +		/* arg2: progs[i] */
> +		emit_mov_imm64(&prog, BPF_REG_2, (long) p >> 32, (u32) (long) p);
> +		if (emit_call(&prog, __cgroup_bpf_run_lsm_sock, prog))
> +			return -EINVAL;
> +	} else {
> +		/* arg2: progs[i]->insnsi for interpreter */
> +		if (!p->jited)
> +			emit_mov_imm64(&prog, BPF_REG_2,
> +				       (long) p->insnsi >> 32,
> +				       (u32) (long) p->insnsi);
> +
> +		/* call JITed bpf program or interpreter */
> +		if (emit_call(&prog, p->bpf_func, prog))
> +			return -EINVAL;

Overall I think it's a workable solution.
As far as mechanism I think it would be better
to allocate single dummy bpf_prog and use normal fmod_ret
registration mechanism instead of hacking arch trampoline bits.
Set dummy_bpf_prog->bpf_func = __cgroup_bpf_run_lsm_sock;
and keep as dummy_bpf_prog->jited = false;
 From p->insnsi pointer in arg2 it's easy to go back to struct bpf_prog.
Such dummy prog might even be statically defined like dummy_bpf_prog.
Or allocated dynamically once.
It can be added as fmod_ret to multiple trampolines.
Just gut the func_model check.

Oooh, that's much cleaner, thanks!

As far as api the attach should probably be to a cgroup+lsm_hook pair.
link_create.target_fd will be cgroup_fd.
At prog load time attach_btf_id should probably be one
of existing bpf_lsm_* hooks.
Feels wrong to duplicate the whole set into lsm_cgroup_sock set.

lsm_cgroup_sock is there to further limit which particular lsm
hooks BPF_LSM_CGROUP_SOCK can use. I guess I can maybe look at
BTF's first argument to verify that it's 'struct socket'? Let
me try to see whether it's a good alternative..

It's fine to have prog->expected_attach_type == BPF_LSM_CGROUP_SOCK
to disambiguate. Will we probably only have two:
BPF_LSM_CGROUP_SOCK and BPF_LSM_CGROUP_TASK ?

I hope so. Unless objects other than socket and task can have cgroup
association.

> +int __cgroup_bpf_run_lsm_sock(u64 *regs, const struct bpf_prog *prog)
> +{
> +	struct socket *sock = (void *)regs[BPF_REG_0];
> +	struct cgroup *cgrp;
> +	struct sock *sk;
> +
> +	sk = sock->sk;
> +	if (!sk)
> +		return 0;
> +
> +	cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> +
> +	return  
BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> +				     regs, bpf_prog_run, 0);
> +}

Would it be fast enough?
We went through a bunch of optimization for normal cgroup and ended with:
         if (cgroup_bpf_enabled(CGROUP_INET_INGRESS) &&
             cgroup_bpf_sock_enabled(sk, CGROUP_INET_INGRESS))
Here the trampoline code plus call into __cgroup_bpf_run_lsm_sock
will be there for all cgroups.
Since cgroup specific check will be inside BPF_PROG_RUN_ARRAY_CG.
I suspect it's ok, since the link_create will be for few specific lsm  
hooks
which are typically not in the fast path.
Unlike traditional cgroup hook like ingress that is hot.

Right, cgroup_bpf_enabled() is not needed because lsm is by definition
off/unattached by default. Seems like we can add cgroup_bpf_sock_enabled()  
to
__cgroup_bpf_run_lsm_sock.

For BPF_LSM_CGROUP_TASK it will take cgroup from current instead of sock,  
right?

Right. Seems like the only difference is where we get the cgroup pointer
from: current vs sock->cgroup. Although, I'm a bit unsure whether to
allow hooks that are clearly sock-cgroup-based to use
BPF_LSM_CGROUP_TASK. For example, should we allow
BPF_LSM_CGROUP_TASK to attach to that socket_post_create? I'd prohibit that  
at
least initially to avoid some subtle 'why sometimes my
programs trigger on the wrong cgroup' types of issues.

Args access should magically work. 'regs' above should be fine for
all lsm hooks.

The typical prog:
+SEC("lsm_cgroup_sock/socket_post_create")
+int BPF_PROG(socket_post_create, struct socket *sock, int family,
+            int type, int protocol, int kern)
looks good too.
Feel natural.
I guess they can be sleepable too?

Haven't gone into the sleepable world, but I don't see any reason why
there couldn't be a sleepable variation.

Thank you for a review!