Re: [PATCH v2 bpf-next 3/5] bpf: Allow kfuncs to be used in LSM programs

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Tue, 21 Jun 2022 13:11:57 -0700

On Tue, Jun 21, 2022 at 11:05:45PM +0530, Kumar Kartikeya Dwivedi wrote:
> On Tue, Jun 21, 2022 at 09:45:15PM IST, KP Singh wrote:
> > On Tue, Jun 21, 2022 at 6:01 PM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Tue, Jun 21, 2022 at 5:36 AM Kumar Kartikeya Dwivedi
> > > <memxor@xxxxxxxxx> wrote:
> > > >
> > > > On Tue, Jun 21, 2022 at 06:58:09AM IST, KP Singh wrote:
> > > > > In preparation for the addition of bpf_getxattr kfunc.
> > > > >
> > > > > Signed-off-by: KP Singh <kpsingh@xxxxxxxxxx>
> >
> > [...]
> >
> > > > > +++ b/kernel/bpf/btf.c
> > > > > @@ -7264,6 +7264,7 @@ static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type)
> > > > >       case BPF_PROG_TYPE_STRUCT_OPS:
> > > > >               return BTF_KFUNC_HOOK_STRUCT_OPS;
> > > > >       case BPF_PROG_TYPE_TRACING:
> > > > > +     case BPF_PROG_TYPE_LSM:
> > > > >               return BTF_KFUNC_HOOK_TRACING;
> > > >
> > > > Should we define another BTF_KFUNC_HOOK_LSM instead? Otherwise when you register
> > > > for tracing or lsm progs, you write to the same hook instead, so kfunc enabled
> > > > for tracing progs also gets enabled for lsm, I guess that is not what user
> > > > intends when registering kfunc set.
> > >
> > > It's probably ok for this case.
> > > We might combine BTF_KFUNC_HOOK* into fewer hooks and
> >
> > I did this intentionally. We do this similarly for helpers too and we
> > unfortunately, we do realize that LSM hooks are not there in all
> > places (esp. where one is auditing stuff).
> >
> > So in reality one does use a combination of LSM and tracing hooks
> > with the policy decision coming from the LSM hook but the state
> > gets accumulated from different hooks.
> >
> >
> > > scope them by attach_btf_id.
> > > Everything is 'tracing' like at the end.
> > > Upcoming hid-bpf is 'tracing'. lsm is 'tracing'.
> > > but we may need to reduce the scope of kfuncs
> > > based on attach point or based on argument type.
> > > tc vs xdp don't need to be separate sets.
> > > Their types (skb vs xdp_md) are different, so only
> > > right kfuncs can be used, but bpf_ct_release() is needed
> > > in both tc and xdp.
> > > So we could combine tc and xdp into 'btf_kfunc_hook_networking'
> > > without compromising the safety.
> > > acquire vs release ideally would be indicated with btf_tag,
> > > but gcc still doesn't have support for btf_tag and we cannot
> > > require the kernel to be built with clang.
> > > acquire, release, sleepable, ret_null should be flags on a kfunc
> > > instead of a set. It would be easier to manage and less boilerplate
> >
> > +1 This would be awesome, I gave it a shot to use btf_tag but hit this
> > feature gap in GCC.
> >
> > - KP
> >
> > > code. Not clear how to do this cleanly.
> > > export_symbol approach is a bit heavy and requires name being unique
> > > across kernel and modules.
> > > func name mangling or typedef-ing comes to mind. not so clean either.
> 
> How does this approach look to 'tag' a kfunc?
> 
> https://godbolt.org/z/jGeK6Y49K
> 
> Then we find the function whose symbol should be emitted and available in BTF
> with the prefix (__kfunc_name_) and read off the tag values. Due to the extern
> inline it should still inline the definition, so there is no overhead at
> runtime.

No run-time overhead, but code size duplication is prohibitive.
Both functions will have full body.
We can hack around with another noinline function that both will call,
but it's getting ugly.

> As you see in godbolt, __foobar_1_2 symbol is visible, with 1 and 2 being tag
> values. Substituting tag name with underlying value makes parsing easier. We
> can have fixed number of slots and keep others as 0 (ignore).
> 
> It can be extended to also tag arguments, which I already did before at [0], but
> never sent out. Then we don't need the suffixes or special naming of arguments
> to indicate some tag.
> 
> e.g.
> 
> bpf_ct_release will be:
> 
> KFUNC_CALL_0(void, bpf_ct_release, __kfunc_release,
> 	     KARG(struct nf_conn *, nfct, __ptr_ref))
> {
> 	...
> }

-struct nf_conn *
-bpf_skb_ct_lookup(struct __sk_buff *skb_ctx, struct bpf_sock_tuple *bpf_tuple,
-		  u32 tuple__sz, struct bpf_ct_opts *opts, u32 opts__sz)
+KFUNC_CALL_5(struct nf_conn *, bpf_skb_ct_lookup,
+	     KARG(struct __sk_buff *, skb_ctx, PTR_TO_CTX),
+	     KARG(struct bpf_sock_tuple *, bpf_tuple, PTR_TO_MEM),
+	     KARG(u32, tuple__sz, SIZE),
+	     KARG(struct bpf_ct_opts *, opts, PTR_TO_MEM),
+	     KARG(u32, opts__sz, SIZE))

I think it's too ugly for majority of kernel folks to accept.
We will have a lot more kfuncs than we have now. Above code will be
present in many different subsystems. The trade-off is not worth it.
__sz suffix convention works. Similar approach can be used.
This is orthogonal discussion anyway.

Back to kfunc tagging.
How about we use
  BTF_SET_START8(list)
  BTF_ID_FLAGS(func, bpf_xdp_ct_lookup, ACQUIRE | RET_NULL)
  BTF_ID_FLAGS(func, bpf_ct_release, RELEASE)
  BTF_SET_END8(list)

where BTF_SET_START8 macro will do "__BTF_ID__set8__" #name
so that resolve_btfid side knows that it needs to ignore 4 bytes after btf_id
when it's doing set sorting.
The kernel side btf_id_set_contains() would need to call bsearch()
with sizeof(u64) instead of sizeof(u32).

The verifier side will be cleaner and faster too.
Instead of two bsearch calls (that are about to become 3 calls with sleepable set):
                rel = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog),
                                                BTF_KFUNC_TYPE_RELEASE, func_id);
                kptr_get = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog),
                                                     BTF_KFUNC_TYPE_KPTR_ACQUIRE, func_id);
It will be a single call plus flag check.