On Wed, Jan 8, 2025 at 10:07 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > From: Hou Tao <houtao1@xxxxxxxxxx> > > During the update procedure, when overwrite element in a pre-allocated > htab, the freeing of old_element is protected by the bucket lock. The > reason why the bucket lock is necessary is that the old_element has > already been stashed in htab->extra_elems after alloc_htab_elem() > returns. If freeing the old_element after the bucket lock is unlocked, > the stashed element may be reused by concurrent update procedure and the > freeing of old_element will run concurrently with the reuse of the > old_element. However, the invocation of check_and_free_fields() may > acquire a spin-lock which violates the lockdep rule because its caller > has already held a raw-spin-lock (bucket lock). The following warning > will be reported when such race happens: > > BUG: scheduling while atomic: test_progs/676/0x00000003 > 3 locks held by test_progs/676: > #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830 > #1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500 > #2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0 > Modules linked in: bpf_testmod(O) > Preemption disabled at: > [<ffffffff817837a3>] htab_map_update_elem+0x293/0x1500 > CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ #11 > Tainted: [W]=WARN, [O]=OOT_MODULE > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)... > Call Trace: > <TASK> > dump_stack_lvl+0x57/0x70 > dump_stack+0x10/0x20 > __schedule_bug+0x120/0x170 > __schedule+0x300c/0x4800 > schedule_rtlock+0x37/0x60 > rtlock_slowlock_locked+0x6d9/0x54c0 > rt_spin_lock+0x168/0x230 > hrtimer_cancel_wait_running+0xe9/0x1b0 Since this issue is limited to RT #ifdef CONFIG_PREEMPT_RT void hrtimer_cancel_wait_running(const struct hrtimer *timer); #else static inline void hrtimer_cancel_wait_running(struct hrtimer *timer) { cpu_relax(); } #endif let's avoid overloading system_unbound_wq in !RT. > - if (this_cpu_read(hrtimer_running)) > - queue_work(system_unbound_wq, &t->cb.delete_work); > - else > - bpf_timer_delete_work(&t->cb.delete_work); keep this sync call on !RT, > + if (!this_cpu_read(hrtimer_running) && hrtimer_try_to_cancel(&t->timer) >= 0) { > + kfree_rcu(t, cb.rcu); > + return; > + } > + > + /* The timer is running on current or other CPU. Use a kworker to wait > + * for the completion of the timer instead of spinning on current CPU > + * or trying to acquire a sleepable lock to wait for its completion. > + */ > + queue_work(system_unbound_wq, &t->cb.delete_work); overloading system_unbound_wq even in !RT when hrtimer_running is an issue to solve as well, but let's not double down on the problem to avoid making things worse. The rest looks good. pw-bot: cr