Re: [PATCH bpf-next v4] bpf: Fix a race condition between btf_put() and map_free()

Martin KaFai Lau <martin.lau@xxxxxxxxx> · Fri, 8 Dec 2023 10:26:34 -0800

On 12/8/23 8:45 AM, Yonghong Song wrote:

On 12/8/23 12:16 AM, Martin KaFai Lau wrote:
On 12/7/23 7:59 PM, Yonghong Song wrote:

I am trying to avoid making a special case for "bool has_btf_ref;" and "bool 
from_map_check". It seems to a bit too much to deal with the error path for 
btf_parse().

Would doing the refcount_set(&btf->refcnt, 1) earlier in btf_parse help?

No, it does not. The core reason is what Hao is mentioned in
https://lore.kernel.org/bpf/47ee3265-23f7-2130-ff28-27bfaf3f7877@xxxxxxxxxxxxxxx/
We simply cannot take btf reference if called from btf_parse().
Let us say we move refcount_set(&btf->refcnt, 1) earlier in btf_parse()
so we take ref for btf during btf_parse_fields(), then we have
      btf_put <=== expect refcount == 0 to start the destruction process
        ...
          btf_record_free <=== in which if graph_root, a btf reference will 
be hold
so btf_put will never be able to actually free btf data.

ah. There is a loop like btf->struct_meta_tab->...btf.

Yes, the kasan problem will be resolved but we leak memory.

It is also unnecessary to take a reference since the value_rec is
referring to a record in struct_meta_tab.

If we optimize for not taking a refcnt, how about not taking a refcnt for 
all cases and postpone the btf_put(), instead of taking refcnt in one case 
but not another. Like your fix in v1. The failed selftest can be changed or 
even removed if it does not make sense anymore.

After a couple of iterations, I think taking necessary reference approach 
sounds better
and this will be consistent with how kptr is handled. For kptr, btf_parse 
will ignore it.

Got it. It is why kptr.btf got away with the loop.

On the other hand, am I reading it correctly that kptr.btf only needs to take 
the refcnt for btf that is btf_is_kernel()?

No. besides vmlinux and module btf, it also takes reference for prog btf, see

static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
                           struct btf_field_info *info)
{
...
         if (id == -ENOENT) {
                 /* btf_parse_kptr should only be called w/ btf = program BTF */
                 WARN_ON_ONCE(btf_is_kernel(btf));
                 /* Type exists only in program BTF. Assume that it's a MEM_ALLOC
                  * kptr allocated via bpf_obj_new
                  */
                 field->kptr.dtor = NULL;
                 id = info->kptr.type_id;
                 kptr_btf = (struct btf *)btf;
                 btf_get(kptr_btf);

I meant only kernel/module btf needs to take the refcnt, so there is no need to 
take the refcnt here for the (it)self btf. Sorry that I was not clear in my 
earlier comment.

The record is capturing something either in the self btf or something in the 
kernel btf. The field->kptr.kptr is the one that may either point to a kernel or 
self btf, so it should be the only case that needs to check the following in 
btf_record_free():

	if (btf_is_kernel(rec->fields[i].kptr.btf))
		btf_put(rec->fields[i].kptr.btf);

All other cases the record has a self btf (including field->graph_root.btf). The 
owner (map here) needs to ensure the self btf is freed after the record is freed.

I was thinking if it can avoid doing different things based on where 
btf_parse_fields() is called by separating what type of btf always needs refcnt 
or not. Agree the approach in this patch will fix the issue also and I have 
acked v5. Thanks for the fix.

                 goto found_dtor;
         }
...
}

Unfortunately, for graph_root (list_head, rb_root), btf_parse and map_check 
will both
process it and that adds a little bit complexity.
Alexei also suggested the same taking reference approach:
https://lore.kernel.org/bpf/CAADnVQL+uc6VV65_Ezgzw3WH=ME9z1Fdy8Pd6xd0oOq8rgwh7g@xxxxxxxxxxxxxx/