Re: [PATCH RFC bpf-next v6 1/6] bpf/helpers: introduce sleepable bpf_timers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 8, 2024 at 10:20 AM Benjamin Tissoires
<benjamin.tissoires@xxxxxxxxxx> wrote:
>
> On Mon, Apr 8, 2024 at 7:08 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
> >
> > On Mon, 2024-04-08 at 10:09 +0200, Benjamin Tissoires wrote:
> >
> > [...]
> >
> > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > index 9234174ccb21..fd05d4358b31 100644
> > > --- a/kernel/bpf/helpers.c
> > > +++ b/kernel/bpf/helpers.c
> > > @@ -1096,12 +1096,19 @@ const struct bpf_func_proto bpf_snprintf_proto = {
> > >   * freeing the timers when inner map is replaced or deleted by user space.
> > >   */
> > >  struct bpf_hrtimer {
> > > -     struct hrtimer timer;
> > > +     union {
> > > +             struct hrtimer timer;
> > > +             struct work_struct work;
> > > +     };
> > >       struct bpf_map *map;
> > >       struct bpf_prog *prog;
> > >       void __rcu *callback_fn;
> > >       void *value;
> > > -     struct rcu_head rcu;
> > > +     union {
> > > +             struct rcu_head rcu;
> > > +             struct work_struct sync_work;
> >
> > Nit:
> > I find this name very confusing, the field is used to cancel timer
> > execution, is it a convention to call such things '...sync...'?
> >
> > > +     };
> > > +     u64 flags;
> > >  };
> > >
> >
> > [...]
> >
> > > +static void bpf_timer_sync_work_cb(struct work_struct *work)
> > > +{
> > > +     struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, sync_work);
> > > +
> > > +     cancel_work_sync(&t->work);
> > > +
> > > +     kfree_rcu(t, rcu);
> >
> > Sorry, I might be wrong, but this looks suspicious.
> > The 'rcu' field of 'bpf_hrtimer' is defined as follows:
> >
> > struct bpf_hrtimer {
> >         ...
> >         union {
> >                 struct rcu_head rcu;
> >                 struct work_struct sync_work;
> >         };
> >         ...
> > };
> >
> > And for sleepable timers the 'sync_work' field is set as follows:
> >
> > BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map,
> >            u64, flags)
> > {
> >         ...
> >         INIT_WORK(&t->sync_work, bpf_timer_sync_work_cb);
> >         ...
> > }
> >
> > So, it looks like 'kfree_rcu' would be called for a non-rcu pointer.
>
> That was my initial assumption too, but Alexei told me it was fine.
> And I think he is correct because kfree_rcu doesn't need the rcu_head
> to be initialized.
>
> So in the end, we initialize the memory as a work_struct, and when
> that work kicks in, we reuse that exact same memory as the rcu_head.
> This is fine because that work will never be reused.
>
> If I understand correctly, this is to save a few bytes as this is a
> critical struct used in programs with a high rate usage, and every
> byte counts.

Yes. All correct.
Probably makes sense to add a comment before kfree_rcu()
line in bpf_timer_sync_work_cb() that
kfree_rcu will wait for bpf_timer_cancel() to finish
as was done in
commit 0281b919e175 ("bpf: Fix racing between
bpf_timer_cancel_and_free and bpf_timer_cancel").

I suspect that's what confused Eduard.

The patch 1 looks great overall.

If we're going to keep this combined bpf_timer_* api for both wq
and timer we'd need to add flags compatibility check
to bpf_timer_start() too.
We can disallow this flag in 'flags' argument and use one from t->flags.
Which kinda makes sense to make bpf_timer_start() less verbose.
Alternatively we can allow bpf_timer_start() to have it,
but then we'd need to check that it is actually passed.
Either way the patch needs an extra check in bpf_timer_start().
Just ignoring BPF_F_TIMER_SLEEPABLE in bpf_timer_start() doesn't look right.





[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux