Re: [PATCH bpf-next] bpf: Mitigate latency spikes associated with freeing non-preallocated htab

Yonghong Song <yonghong.song@xxxxxxxxx> · Tue, 26 Mar 2024 09:50:04 -0700

On 3/26/24 1:12 AM, Yafang Shao wrote:
Following the recent upgrade of one of our BPF programs, we encountered
significant latency spikes affecting other applications running on the same
host. After thorough investigation, we identified that these spikes were
primarily caused by the prolonged duration required to free a
non-preallocated htab with approximately 2 million keys.

Notably, our kernel configuration lacks the presence of CONFIG_PREEMPT. In
scenarios where kernel execution extends excessively, other threads might
be starved of CPU time, resulting in latency issues across the system. To
mitigate this, we've adopted a proactive approach by incorporating
cond_resched() calls within the kernel code. This ensures that during
lengthy kernel operations, the scheduler is invoked periodically to provide
opportunities for other threads to execute.

Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
---
  kernel/bpf/hashtab.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 3a088a5349bc..d3d5aad045cc 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1489,6 +1489,7 @@ static void delete_all_elements(struct bpf_htab *htab)
  		hlist_nulls_for_each_entry_safe(l, n, head, hash_node) {
  			hlist_nulls_del_rcu(&l->hash_node);
  			htab_elem_free(htab, l);
+			cond_resched();
  		}

should we put cond_resched() here inside the top 'for' loop, but outside
the bucket loop? Do you really have a long link list for a particular bucket?
Otherwise, the patch looks good to me. In hashtab.c, we have cond_resched()
in some other places to mitigate similar issues.

  	}
  	migrate_enable();