Hi, On 6/28/2023 9:56 AM, Alexei Starovoitov wrote: > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > To address OOM issue when one cpu is allocating and another cpu is freeing add > a target bpf_mem_cache hint to allocated objects and when local cpu free_llist > overflows free to that bpf_mem_cache. The hint addresses the OOM while > maintaing the same performance for common case when alloc/free are done on the > same cpu. > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> Acked-by: Hou Tao <houtao1@xxxxxxxxxx> But have a minor comment for do_call_rcu_ttrace() below. > --- > kernel/bpf/memalloc.c | 46 ++++++++++++++++++++++++++----------------- > 1 file changed, 28 insertions(+), 18 deletions(-) SNIP > static void do_call_rcu_ttrace(struct bpf_mem_cache *c) > @@ -295,7 +289,7 @@ static void do_call_rcu_ttrace(struct bpf_mem_cache *c) > return; > > WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp_ttrace)); > - llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu_ttrace)) > + llist_for_each_safe(llnode, t, llist_del_all(&c->free_by_rcu_ttrace)) > /* There is no concurrent __llist_add(waiting_for_gp_ttrace) access. > * It doesn't race with llist_del_all either. > * But there could be two concurrent llist_del_all(waiting_for_gp_ttrace): > @@ -312,16 +306,22 @@ static void do_call_rcu_ttrace(struct bpf_mem_cache *c) > * If RCU Tasks Trace grace period implies RCU grace period, free > * these elements directly, else use call_rcu() to wait for normal > * progs to finish and finally do free_one() on each element. > + * > + * call_rcu_tasks_trace() enqueues to a global queue, so it's ok > + * that current cpu bpf_mem_cache != target bpf_mem_cache. > */ > call_rcu_tasks_trace(&c->rcu_ttrace, __free_rcu_tasks_trace); "a global queue" in the comment is not accurate. call_rcu_tasks_trace() will switch between to per-CPU queue when the global queue is too busy and rcupdate.rcu_task_enqueue_lim in boot cmdline also can be used to control whether or not a per-CPU queue is used.