Hi, On 6/28/2023 9:56 AM, Alexei Starovoitov wrote: > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu(). > Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into > per-cpu free list the _rcu() flavor waits for RCU grace period and then moves > objects into free_by_rcu_ttrace list where they are waiting for RCU > task trace grace period to be freed into slab. > > The life cycle of objects: > alloc: dequeue free_llist > free: enqeueu free_llist > free_rcu: enqueue free_by_rcu -> waiting_for_gp > free_llist above high watermark -> free_by_rcu_ttrace > after RCU GP waiting_for_gp -> free_by_rcu_ttrace > free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> SNIP > > +static void __free_by_rcu(struct rcu_head *head) > +{ > + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); > + struct bpf_mem_cache *tgt = c->tgt; > + struct llist_node *llnode; > + > + llnode = llist_del_all(&c->waiting_for_gp); > + if (!llnode) > + goto out; > + > + llist_add_batch(llnode, c->waiting_for_gp_tail, &tgt->free_by_rcu_ttrace); > + > + /* Objects went through regular RCU GP. Send them to RCU tasks trace */ > + do_call_rcu_ttrace(tgt); I still got report about leaked free_by_rcu_ttrace without adding any extra hack except using bpf_mem_cache_free_rcu() in htab. When bpf ma is freed through free_mem_alloc(), the following sequence may lead to leak of free_by_rcu_ttrace: P1: bpf_mem_alloc_destroy() P2: __free_by_rcu() // got false P2: read c->draining P1: c->draining = true P1: llist_del_all(&c->free_by_rcu_ttrace) // add to free_by_rcu_ttrace again P2: llist_add_batch(..., &tgt->free_by_rcu_ttrace) P2: do_call_rcu_ttrace() // call_rcu_ttrace_in_progress is 1, so xchg return 1 // and it doesn't being moved to waiting_for_gp_ttrace P2: atomic_xchg(&c->call_rcu_ttrace_in_progress, 1) // got 1 P1: atomic_read(&c->call_rcu_ttrace_in_progress) // objects in free_by_rcu_ttrace is leaked I think the race could be fixed by checking c->draining in do_call_rcu_ttrace() when atomic_xchg() returns 1 as shown below: diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 2bdb894392c5..9f41025560bd 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -303,8 +303,13 @@ static void do_call_rcu_ttrace(struct bpf_mem_cache *c) { struct llist_node *llnode, *t; - if (atomic_xchg(&c->call_rcu_ttrace_in_progress, 1)) + if (atomic_xchg(&c->call_rcu_ttrace_in_progress, 1)) { + if (READ_ONCE(c->draining)) { + llnode = llist_del_all(&c->free_by_rcu_ttrace); + free_all(llnode, !!c->percpu_size); + } return; + }