On Wed, Apr 3, 2024 at 11:50 AM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Wed, Mar 27, 2024 at 10:02 AM Benjamin Tissoires > <benjamin.tissoires@xxxxxxxxxx> wrote: > > > > goto out; > > > > } > > > > + spin_lock(&t->sleepable_lock); > > > > drop_prog_refcnt(t); > > > > + spin_unlock(&t->sleepable_lock); > > > > > > this also looks odd. > > > > I basically need to protect "t->prog = NULL;" from happening while > > bpf_timer_work_cb is setting up the bpf program to be run. > > Ok. I think I understand the race you're trying to fix. > The bpf_timer_cancel_and_free() is doing > cancel_work() > and proceeds with > kfree_rcu(t, rcu); > > That's the only race and these extra locks don't help. > > The t->prog = NULL is nothing to worry about. > The bpf_timer_work_cb() might still see callback_fn == NULL > "when it's being setup" and it's ok. > These locks don't help that. > > I suggest to drop sleepable_lock everywhere. > READ_ONCE of callback_fn in bpf_timer_work_cb() is enough. > Add rcu_read_lock_trace() before calling bpf prog. > > The race to fix is above 'cancel_work + kfree_rcu' > since kfree_rcu might free 'struct bpf_hrtimer *t' > while the work is pending and work_queue internal > logic might UAF struct work_struct work. > By the time it may luckily enter bpf_timer_work_cb() it's too late. > The argument 'struct work_struct *work' might already be freed. > > To fix this problem, how about the following: > don't call kfree_rcu and instead queue the work to free it. > After cancel_work(&t->work); the work_struct can be reused. > So set it up to call "freeing callback" and do > schedule_work(&t->work); > > There is a big assumption here that new work won't be > executed before cancelled work completes. > Need to check with wq experts. > > Another approach is to do something smart with > cancel_work() return code. > If it returns true set a flag inside bpf_hrtimer and > make bpf_timer_work_cb() free(t) after bpf prog finishes. Looking through wq code... I think I have to correct myself. cancel_work and immediate free is probably fine from wq pov. It has this comment: worker->current_func(work); /* * While we must be careful to not use "work" after this, the trace * point will only record its address. */ trace_workqueue_execute_end(work, worker->current_func); the bpf_timer_work_cb() might still be running bpf prog. So it shouldn't touch 'struct bpf_hrtimer *t' after bpf prog returns, since kfree_rcu(t, rcu); could have freed it by then. There is also this code in net/rxrpc/rxperf.c cancel_work(&call->work); kfree(call); So it looks like it's fine to drop sleepable_lock, add rcu_read_lock_trace() and things should be ok.