From: Hou Tao <houtao1@xxxxxxxxxx> Hi, The patch set continues the previous work [1] to move all the freeings of htab elements out of bucket lock. One motivation for the patch set is the locking problem reported by Sebastian [2]: the freeing of bpf_timer under PREEMPT_RT may acquire a spin-lock (namely softirq_expiry_lock). However the freeing procedure for htab element has already held a raw-spin-lock (namely bucket lock), and it will trigger the warning: "BUG: scheduling while atomic" as demonstrated by the selftests patch. Another motivation is to reduce the locked scope of bucket lock. The patch set is structured as follows: * Patch #1 moves the element freeing out of lock for htab_lru_map_delete_node() * Patch #2~#3 move the element freeing out of lock for __htab_map_lookup_and_delete_elem() * Patch #4~#6 move the element freeing out of lock for htab_map_update_elem() * Patch #7 adds a selftest for the locking problem The changes for htab_map_update_elem() require some explanation. The reason that the previous work [1] can't move the element freeing out of the bucket lock for preallocated hash table is due to ->extra_elems optimization. When alloc_htab_elem() returns, the existed-old element has already been stashed in per-cpu ->extra_elems. To handle that, patch #5~#7 break the reuse of ->extra_elems and the refill of ->extra_elems into two independent steps, do resue with bucket lock being held and do refill after unlocking the bucket lock. The downside is that concurrent updates on the same CPU may need to pop free element from per-cpu list instead of reusing ->extra_elems directly, but I think such case will be rare. Please see individual patches for more details. Comments are always welcome. [1]: https://lore.kernel.org/bpf/20241106063542.357743-1-houtao@xxxxxxxxxxxxxxx [2]: https://lore.kernel.org/bpf/20241106084527.4gPrMnHt@xxxxxxxxxxxxx Hou Tao (7): bpf: Free special fields after unlock in htab_lru_map_delete_node() bpf: Bail out early in __htab_map_lookup_and_delete_elem() bpf: Free element after unlock in __htab_map_lookup_and_delete_elem() bpf: Support refilling extra_elems in free_htab_elem() bpf: Factor out the element allocation for pre-allocated htab bpf: Free element after unlock for pre-allocated htab selftests/bpf: Add test case for the freeing of bpf_timer kernel/bpf/hashtab.c | 170 ++++++++++-------- .../selftests/bpf/prog_tests/free_timer.c | 165 +++++++++++++++++ .../testing/selftests/bpf/progs/free_timer.c | 71 ++++++++ 3 files changed, 332 insertions(+), 74 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/free_timer.c create mode 100644 tools/testing/selftests/bpf/progs/free_timer.c -- 2.29.2