Re: [PATCH bpf-next 0/7] Free htab element out of bucket lock

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Tue, 7 Jan 2025 20:09:17 +0800



On 1/7/2025 4:55 PM, Hou Tao wrote:
> From: Hou Tao <houtao1@xxxxxxxxxx>
>
> Hi,
>
> The patch set continues the previous work [1] to move all the freeings
> of htab elements out of bucket lock. One motivation for the patch set is
> the locking problem reported by Sebastian [2]: the freeing of bpf_timer
> under PREEMPT_RT may acquire a spin-lock (namely softirq_expiry_lock).
> However the freeing procedure for htab element has already held a
> raw-spin-lock (namely bucket lock), and it will trigger the warning:
> "BUG: scheduling while atomic" as demonstrated by the selftests patch.
> Another motivation is to reduce the locked scope of bucket lock.
>
> The patch set is structured as follows:
>
> * Patch #1 moves the element freeing out of lock for
>   htab_lru_map_delete_node()
> * Patch #2~#3 move the element freeing out of lock for
>   __htab_map_lookup_and_delete_elem()
> * Patch #4~#6 move the element freeing out of lock for
>   htab_map_update_elem()
> * Patch #7 adds a selftest for the locking problem
>
> The changes for htab_map_update_elem() require some explanation. The
> reason that the previous work [1] can't move the element freeing out of
> the bucket lock for preallocated hash table is due to ->extra_elems
> optimization. When alloc_htab_elem() returns, the existed-old element
> has already been stashed in per-cpu ->extra_elems. To handle that, patch
> #5~#7 break the reuse of ->extra_elems and the refill of ->extra_elems
> into two independent steps, do resue with bucket lock being held and do
> refill after unlocking the bucket lock. The downside is that concurrent
> updates on the same CPU may need to pop free element from per-cpu list
> instead of reusing ->extra_elems directly, but I think such case will be
> rare.

Er, the break of reuse and refill of ->extra_elems is buggy. It failed
the htab_update/concurrent_update in BPF CI occasionally. It may also
return the unexpected E2BIG error when the map is full and there are
concurrent overwrites procedure on the same CPU. Need to figure out
another way to handle the lock problem.
> Please see individual patches for more details. Comments are always
> welcome.
>
> [1]: https://lore.kernel.org/bpf/20241106063542.357743-1-houtao@xxxxxxxxxxxxxxx
> [2]: https://lore.kernel.org/bpf/20241106084527.4gPrMnHt@xxxxxxxxxxxxx
>
> Hou Tao (7):
>   bpf: Free special fields after unlock in htab_lru_map_delete_node()
>   bpf: Bail out early in __htab_map_lookup_and_delete_elem()
>   bpf: Free element after unlock in __htab_map_lookup_and_delete_elem()
>   bpf: Support refilling extra_elems in free_htab_elem()
>   bpf: Factor out the element allocation for pre-allocated htab
>   bpf: Free element after unlock for pre-allocated htab
>   selftests/bpf: Add test case for the freeing of bpf_timer
>
>  kernel/bpf/hashtab.c                          | 170 ++++++++++--------
>  .../selftests/bpf/prog_tests/free_timer.c     | 165 +++++++++++++++++
>  .../testing/selftests/bpf/progs/free_timer.c  |  71 ++++++++
>  3 files changed, 332 insertions(+), 74 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/free_timer.c
>  create mode 100644 tools/testing/selftests/bpf/progs/free_timer.c
>