Hi, On 10/20/2023 10:14 AM, Alexei Starovoitov wrote: > On Thu, Oct 19, 2023 at 6:41 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >> From: Hou Tao <houtao1@xxxxxxxxxx> >> >> When there are concurrent uref release and bpf timer init operations, >> the following sequence diagram is possible and it will lead to memory >> leak: >> >> bpf program X >> >> bpf_timer_init() >> lock timer->lock >> read timer->timer as NULL >> read map->usercnt != 0 >> >> process Y >> >> close(map_fd) >> // put last uref >> bpf_map_put_uref() >> atomic_dec_and_test(map->usercnt) >> array_map_free_timers() >> bpf_timer_cancel_and_free() >> // just return and lead to memory leak >> read timer->timer is NULL >> >> t = bpf_map_kmalloc_node() >> timer->timer = t >> unlock timer->lock >> >> Fix the problem by checking map->usercnt again after timer->timer is >> assigned, so when there are concurrent uref release and bpf timer init, >> either bpf_timer_cancel_and_free() from uref release reads a no-NULL >> timer and the newly-added check of map->usercnt reads a zero usercnt. >> >> Because atomic_dec_and_test(map->usercnt) and READ_ONCE(timer->timer) >> in bpf_timer_cancel_and_free() are not protected by a lock, so add >> a memory barrier to guarantee the order between map->usercnt and >> timer->timer. Also use WRITE_ONCE(timer->timer, x) to match the lockless >> read of timer->timer. >> >> Reported-by: Hsin-Wei Hung <hsinweih@xxxxxxx> >> Closes: https://lore.kernel.org/bpf/CABcoxUaT2k9hWsS1tNgXyoU3E-=PuOgMn737qK984fbFmfYixQ@xxxxxxxxxxxxxx >> Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") >> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx> >> --- >> kernel/bpf/helpers.c | 18 +++++++++++++++--- >> 1 file changed, 15 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c >> index 757b99c1e613f..a7d92c3ddc3dd 100644 >> --- a/kernel/bpf/helpers.c >> +++ b/kernel/bpf/helpers.c >> @@ -1156,7 +1156,7 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map >> u64, flags) >> { >> clockid_t clockid = flags & (MAX_CLOCKS - 1); >> - struct bpf_hrtimer *t; >> + struct bpf_hrtimer *t, *to_free = NULL; >> int ret = 0; >> >> BUILD_BUG_ON(MAX_CLOCKS != 16); >> @@ -1197,9 +1197,21 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map >> rcu_assign_pointer(t->callback_fn, NULL); >> hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT); >> t->timer.function = bpf_timer_cb; >> - timer->timer = t; >> + WRITE_ONCE(timer->timer, t); >> + /* Guarantee order between timer->timer and map->usercnt. So when >> + * there are concurrent uref release and bpf timer init, either >> + * bpf_timer_cancel_and_free() called by uref release reads a no-NULL >> + * timer or atomic64_read() below reads a zero usercnt. >> + */ >> + smp_mb(); >> + if (!atomic64_read(&map->usercnt)) { >> + WRITE_ONCE(timer->timer, NULL); >> + to_free = t; > just kfree(t); here. Will do. It is a slow path, so I think doing kfree() under spin-lock is acceptable. > >> + ret = -EPERM; >> + } > This will add a second atomic64_read(&map->usercnt) in the same function. > Let's remove the first one ? I prefer to still keep it. Because it can detect the release of map uref early and the handle of uref release is simple compared with the second atomic64_read(). Do you have a strong preference ? > >> out: >> __bpf_spin_unlock_irqrestore(&timer->lock); >> + kfree(to_free); >> return ret; >> } >> >> @@ -1372,7 +1384,7 @@ void bpf_timer_cancel_and_free(void *val) >> /* The subsequent bpf_timer_start/cancel() helpers won't be able to use >> * this timer, since it won't be initialized. >> */ >> - timer->timer = NULL; >> + WRITE_ONCE(timer->timer, NULL); >> out: >> __bpf_spin_unlock_irqrestore(&timer->lock); >> if (!t) >> -- >> 2.29.2 >> > .