Re: [PATCH] bpf: verifier: allow arrays of progs to be used in sleepable context

Benjamin Tissoires <bentiss@xxxxxxxxxx> · Tue, 7 May 2024 15:32:15 +0200

On May 06 2024, Alexei Starovoitov wrote:
> On Tue, Apr 30, 2024 at 3:03 AM Benjamin Tissoires <bentiss@xxxxxxxxxx> wrote:
> >
> >
> > Right now, what I am doing is (in simplified pseudo code):
> > - in a bpf program, the user calls hid_bpf_attach_prog(hid_device, program_fd)
> >   where program fd is a tracing program on a never executed function
> >   but this allows to know the type of program to run
> > - the kernel stores that program into a dedicated prog array bpf_map
> >   pre-loaded at boot time
> > - when a event comes in, the kernel walks through the list of attached
> >   programs, calls __hid_bpf_tail_call() and there is a tracing program
> >   attached to it that just do the bpf_tail_call.
> >
> > This works and is simple enough from the user point of view, but is
> > rather inefficient and clunky from the kernel point of view IMO.
> >
> > The freplace mechnism would definitely work if I had a tracing-like
> > function to call, where I need to run the program any time the function
> > gets called. But given that I want per-device filtering, I'm not sure
> > how I could make this work. But given that I need to enable or not the
> > bpf_program, I'm not sure how I could make it work from the kernel point
> > of view.
> >
> > I tried using a simple bpf_prog_run() (which is exactly what I need in
> > the end) but I couldn't really convince the bpf verifier that the
> > provided context is a struct hid_bpf_ctx kernel pointer, and it felt not
> > quite right.
> >
> > So after seeing how the bpf_wq worked internally, and how simple it is
> > now to call a bpf program from the kernel as a simple function call, I
> > played around with allowing kfunc to declare async callback functions.
> >
> > I have a working prototype (probably not fully functional for all of the
> > cases), but I would like to know if you think it would be interesting to
> > have 3 new suffixes:
> > - "__async" for declaring an static bpf program that can be stored in
> >   the kernel and which would be non sleepable
> > - "__s_async" same as before, but for sleepable operations
> > - "__aux" (or "__prog_aux") for that extra parameter to
> >   bpf_wq_set_callback_impl() which contains the struct bpf_prog*.
> 
> Sorry for the delay. I've been traveling.

no worries. I'll be off myself the next few days too :)

> 
> I don't quite understand how these suffixes will work.
> You mean arguments of kfuncs to tell kfunc that an argument
> is a pointer to async callback?
> Sort-of generalization of is_async_callback_calling_kfunc() ?
> I cannot connect the dots.

Yes, exactly that. See [0] for my current WIP. I've just sent it, not
for reviews, but so you see what I meant here.

> 
> Feels dangerous to open up bpf prog calling to arbitrary kfuncs.
> wq/timer/others are calling progs with:
>         callback_fn((u64)(long)map, (u64)(long)key, (u64)(long)value, 0, 0);
> I feel we'll struggle to code review kfuncs that do such things.
> Plus all prog life time management concerns.
> wq/timer do careful bpf_prog_put/get dance.

That part is indeed painful.

> 
> > (I still don't have the __aux yet FWIW)
> >
> > The way I'm doing it is looking at the btf information to fetch the
> > signature of the parameters of the callback, this way we can declare any
> > callback without having to teach the verifier of is arguments (5 max).
> >
> > Is this something you would be comfortable with or is there a simpler
> > mechanism already in place to call the bpf programs from the kernel
> > without the ctx limitations?
> 
> "ctx limitations" you mean 5 args?

No, I was meaning the special context argument for bpf_prog_run. But
anyway, this is relatively moot if struct_ops that you mention below is
working :)

> Have you looked at struct_ops ?
> It can have any number of args.
> Maybe declare struct_ops in hid and let user space provide struct_ops progs?

Last time I checked, I thought struct_ops were only for defining one set
of operations. And you could overwrite them exactly once.
But after reading more carefully how it was used in tcp_cong.c, it seems
we can have multiple programs which define the same struct_ops, and then
it's the kernel which will choose which one needs to be run.

AFAICT there is no sleepable context concerns as well, it depends on how
the kernel starts the various ops.

Last, I'm not entirely sure how I can specify which struct_ops needs to be
attached to which device, but it's worth a shot. I've already realized
that I would probably have to drop the current way of HID-BPF is running,
so now it's just technical bits to assemble :)

> 
> > I can also easily switch the bpf_wq specific cases in the verifier with
> > those suffixes. There are still one or two wq specifics I haven't
> > implemented through __s_async, but that would still makes things more
> > generic.
> >

Cheers,
Benjamin

[0] https://lore.kernel.org/bpf/20240507-bpf_async-v1-0-b4df966096d8@xxxxxxxxxx/T/#t