Re: [PATCH v3 bpf-next 12/13] bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu().

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 28 Jun 2023 20:42:31 -0700

On Wed, Jun 28, 2023 at 7:24 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> On 6/28/2023 9:56 AM, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@xxxxxxxxxx>
> >
> > Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu().
> > Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into
> > per-cpu free list the _rcu() flavor waits for RCU grace period and then moves
> > objects into free_by_rcu_ttrace list where they are waiting for RCU
> > task trace grace period to be freed into slab.
> >
> > The life cycle of objects:
> > alloc: dequeue free_llist
> > free: enqeueu free_llist
> > free_rcu: enqueue free_by_rcu -> waiting_for_gp
> > free_llist above high watermark -> free_by_rcu_ttrace
> > after RCU GP waiting_for_gp -> free_by_rcu_ttrace
> > free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab
> >
> > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
> SNIP
> >
> > +static void __free_by_rcu(struct rcu_head *head)
> > +{
> > +     struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
> > +     struct bpf_mem_cache *tgt = c->tgt;
> > +     struct llist_node *llnode;
> > +
> > +     llnode = llist_del_all(&c->waiting_for_gp);
> > +     if (!llnode)
> > +             goto out;
> > +
> > +     llist_add_batch(llnode, c->waiting_for_gp_tail, &tgt->free_by_rcu_ttrace);
> > +
> > +     /* Objects went through regular RCU GP. Send them to RCU tasks trace */
> > +     do_call_rcu_ttrace(tgt);
>
> I still got report about leaked free_by_rcu_ttrace without adding any
> extra hack except using bpf_mem_cache_free_rcu() in htab.

Please share the steps to repro.

> When bpf ma is freed through free_mem_alloc(), the following sequence
> may lead to leak of free_by_rcu_ttrace:
>
> P1: bpf_mem_alloc_destroy()
>     P2: __free_by_rcu()
>
>     // got false
>     P2: read c->draining
>
> P1: c->draining = true
> P1: llist_del_all(&c->free_by_rcu_ttrace)
>
>     // add to free_by_rcu_ttrace again
>     P2: llist_add_batch(..., &tgt->free_by_rcu_ttrace)
>         P2: do_call_rcu_ttrace()
>             // call_rcu_ttrace_in_progress is 1, so xchg return 1
>             // and it doesn't being moved to waiting_for_gp_ttrace
>             P2: atomic_xchg(&c->call_rcu_ttrace_in_progress, 1)
>
> // got 1
> P1: atomic_read(&c->call_rcu_ttrace_in_progress)
> // objects in free_by_rcu_ttrace is leaked
>
> I think the race could be fixed by checking c->draining in
> do_call_rcu_ttrace() when atomic_xchg() returns 1 as shown below:

If the theory of the bug holds true then the fix makes sense,
but did you repro without fix and cannot repro with the fix?
We should not add extra code based on a hunch.