Hi, On 6/24/2023 2:49 PM, Hou Tao wrote: > Hi, > > On 6/24/2023 11:13 AM, Alexei Starovoitov wrote: >> From: Alexei Starovoitov <ast@xxxxxxxxxx> >> > SNIP >> >> +static void __free_by_rcu(struct rcu_head *head) >> +{ >> + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); >> + struct bpf_mem_cache *tgt = c->tgt; >> + struct llist_node *llnode; >> + >> + if (unlikely(READ_ONCE(c->draining))) >> + goto out; >> + >> + llnode = llist_del_all(&c->waiting_for_gp); >> + if (!llnode) >> + goto out; >> + >> + if (llist_add_batch(llnode, c->waiting_for_gp_tail, &tgt->free_by_rcu_ttrace)) >> + tgt->free_by_rcu_ttrace_tail = c->waiting_for_gp_tail; > Got a null-ptr dereference oops when running multiple test_maps and > htab-mem benchmark after hacking htab to use bpf_mem_cache_free_rcu(). > And I think it happened as follow: The same null-ptr dereference happened even before switching htab to use bpf_mem_cache_free_rcu(). So it seems after introducing c->tgt, there could be two concurrent free_bulk() but with the same c->tgt. > > // c->tgt > P1: __free_by_rcu() > // c->tgt is the same as P1 > P2: __free_by_rcu() > > // return true > P1: llist_add_batch(&tgt->free_by_rcu_ttrace) > // return false > P2: llist_add_batch(&tgt->free_by_rcu_ttrace) > P2: do_call_rcu_ttrace > // return false > P2: xchg(tgt->call_rcu_ttrace_in_progress, 1) > // llnode is not NULL > P2: llnode = llist_del_all(&c->free_by_rcu_ttrace) > // BAD: c->free_by_rcu_ttrace_tail is NULL, so oops > P2: __llist_add_batch(llnode, c->free_by_rcu_ttrace_tail) > > P1: tgt->free_by_rcu_ttrace_tail = X > > I don't have a good fix for the problem except adding a spin-lock for > free_by_rcu_ttrace and free_by_rcu_ttrace_tail. > > > .