Hi, On 6/24/2023 11:13 AM, Alexei Starovoitov wrote: > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu(). > Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into > per-cpu free list the _rcu() flavor waits for RCU grace period and then moves > objects into free_by_rcu_ttrace list where they are waiting for RCU > task trace grace period to be freed into slab. SNIP > +static void __free_by_rcu(struct rcu_head *head) > +{ > + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); > + struct bpf_mem_cache *tgt = c->tgt; > + struct llist_node *llnode; > + > + if (unlikely(READ_ONCE(c->draining))) > + goto out; Because the reading of c->draining and list_add_batch(..., free_by_rcu_ttrace) is lockless, so checking draining here could not prevent the leak of objects in c->free_by_rcu_ttrace() as show below (hope the formatting is OK now). A simple fix is to drain free_by_rcu_ttrace twice as suggested before. Or checking c->draining again in __free_by_rcu() when atomic_xchg() returns 1 and calling free_all(free_by_rcu_ttrace) if draining is true. P1: bpf_mem_alloc_destroy() P2: __free_by_rcu() // got false P2: read c->draining P1: c->draining = true P1: llist_del_all(&c->free_by_rcu_ttrace) // add to free_by_rcu_ttrace again P2: llist_add_batch(..., &tgt->free_by_rcu_ttrace) P2: do_call_rcu_ttrace() // call_rcu_ttrace_in_progress is 1, so xchg return 1 // and it doesn't being moved to waiting_for_gp_ttrace P2: atomic_xchg(&c->call_rcu_ttrace_in_progress, 1) // got 1 P1: atomic_read(&c->call_rcu_ttrace_in_progress) // objects in free_by_rcu_ttrace is leaked c->draining also can't guarantee bpf_mem_alloc_destroy() will wait for the inflight call_rcu_tasks_trace() callback as shown in the following two cases (these two cases are the same as reported in v1 and I only reformatted these two diagrams). And I suggest to do bpf_mem_alloc_destroy as follows: if (ma->cache) { rcu_in_progress = 0; for_each_possible_cpu(cpu) { c = per_cpu_ptr(ma->cache, cpu); irq_work_sync(&c->refill_work); drain_mem_cache(c); rcu_in_progress += atomic_read(&c->call_rcu_in_progress); } for_each_possible_cpu(cpu) { c = per_cpu_ptr(ma->cache, cpu); rcu_in_progress += atomic_read(&c->call_rcu_ttrace_in_progress); } Case 1: P1: bpf_mem_alloc_destroy() P2: __free_by_rcu() // got false P2: c->draining P1: c->draining = true // got 0 P1: atomic_read(&c->call_rcu_ttrace_in_progress) P2: do_call_rcu_ttrace() // return 0 P2: atomic_xchg(&c->call_rcu_ttrace_in_progress, 1) P2: call_rcu_tasks_trace() P2: atomic_set(&c->call_rcu_in_progress, 0) // also got 0 P1: atomic_read(&c->call_rcu_in_progress) // won't wait for the inflight __free_rcu_tasks_trace Case 2: P1: bpf_mem_alloc_destroy P2: __free_by_rcu for c1 P2: read c1->draing P1: c0->draining = true P1: c1->draining = true // both of in_progress counter is 0 P1: read c0->call_rcu_in_progress P1: read c0->call_rcu_ttrace_in_progress // c1->tgt is c0 // c1->call_rcu_in_progress is 1 // c0->call_rcu_ttrace_in_progress is 0 P2: llist_add_batch(..., c0->free_by_rcu_ttrace) P2: xchg(c0->call_rcu_ttrace_in_progress, 1) P2: call_rcu_tasks_trace(c0) P2: c1->call_rcu_in_progress = 0 // both of in_progress counter is 0 P1: read c1->call_rcu_in_progress P1: read c1->call_rcu_ttrace_in_progress // BAD! There is still inflight tasks trace RCU callback P1: free_mem_alloc_no_barrier() > + > + llnode = llist_del_all(&c->waiting_for_gp); > + if (!llnode) > + goto out; > + > + if (llist_add_batch(llnode, c->waiting_for_gp_tail, &tgt->free_by_rcu_ttrace)) > + tgt->free_by_rcu_ttrace_tail = c->waiting_for_gp_tail; > + > + /* Objects went through regular RCU GP. Send them to RCU tasks trace */ > + do_call_rcu_ttrace(tgt); > +out: > + atomic_set(&c->call_rcu_in_progress, 0); > +} > +