On Mon, Apr 8, 2024 at 10:20 AM Benjamin Tissoires <benjamin.tissoires@xxxxxxxxxx> wrote: > > On Mon, Apr 8, 2024 at 7:08 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > > > On Mon, 2024-04-08 at 10:09 +0200, Benjamin Tissoires wrote: > > > > [...] > > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > > > index 9234174ccb21..fd05d4358b31 100644 > > > --- a/kernel/bpf/helpers.c > > > +++ b/kernel/bpf/helpers.c > > > @@ -1096,12 +1096,19 @@ const struct bpf_func_proto bpf_snprintf_proto = { > > > * freeing the timers when inner map is replaced or deleted by user space. > > > */ > > > struct bpf_hrtimer { > > > - struct hrtimer timer; > > > + union { > > > + struct hrtimer timer; > > > + struct work_struct work; > > > + }; > > > struct bpf_map *map; > > > struct bpf_prog *prog; > > > void __rcu *callback_fn; > > > void *value; > > > - struct rcu_head rcu; > > > + union { > > > + struct rcu_head rcu; > > > + struct work_struct sync_work; > > > > Nit: > > I find this name very confusing, the field is used to cancel timer > > execution, is it a convention to call such things '...sync...'? > > > > > + }; > > > + u64 flags; > > > }; > > > > > > > [...] > > > > > +static void bpf_timer_sync_work_cb(struct work_struct *work) > > > +{ > > > + struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, sync_work); > > > + > > > + cancel_work_sync(&t->work); > > > + > > > + kfree_rcu(t, rcu); > > > > Sorry, I might be wrong, but this looks suspicious. > > The 'rcu' field of 'bpf_hrtimer' is defined as follows: > > > > struct bpf_hrtimer { > > ... > > union { > > struct rcu_head rcu; > > struct work_struct sync_work; > > }; > > ... > > }; > > > > And for sleepable timers the 'sync_work' field is set as follows: > > > > BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map, > > u64, flags) > > { > > ... > > INIT_WORK(&t->sync_work, bpf_timer_sync_work_cb); > > ... > > } > > > > So, it looks like 'kfree_rcu' would be called for a non-rcu pointer. > > That was my initial assumption too, but Alexei told me it was fine. > And I think he is correct because kfree_rcu doesn't need the rcu_head > to be initialized. > > So in the end, we initialize the memory as a work_struct, and when > that work kicks in, we reuse that exact same memory as the rcu_head. > This is fine because that work will never be reused. > > If I understand correctly, this is to save a few bytes as this is a > critical struct used in programs with a high rate usage, and every > byte counts. Yes. All correct. Probably makes sense to add a comment before kfree_rcu() line in bpf_timer_sync_work_cb() that kfree_rcu will wait for bpf_timer_cancel() to finish as was done in commit 0281b919e175 ("bpf: Fix racing between bpf_timer_cancel_and_free and bpf_timer_cancel"). I suspect that's what confused Eduard. The patch 1 looks great overall. If we're going to keep this combined bpf_timer_* api for both wq and timer we'd need to add flags compatibility check to bpf_timer_start() too. We can disallow this flag in 'flags' argument and use one from t->flags. Which kinda makes sense to make bpf_timer_start() less verbose. Alternatively we can allow bpf_timer_start() to have it, but then we'd need to check that it is actually passed. Either way the patch needs an extra check in bpf_timer_start(). Just ignoring BPF_F_TIMER_SLEEPABLE in bpf_timer_start() doesn't look right.