Re: [PATCH bpf-next v4] bpf: Fix a race condition between btf_put() and map_free()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/8/23 8:45 AM, Yonghong Song wrote:

On 12/8/23 12:16 AM, Martin KaFai Lau wrote:
On 12/7/23 7:59 PM, Yonghong Song wrote:

I am trying to avoid making a special case for "bool has_btf_ref;" and "bool from_map_check". It seems to a bit too much to deal with the error path for btf_parse().

Would doing the refcount_set(&btf->refcnt, 1) earlier in btf_parse help?

No, it does not. The core reason is what Hao is mentioned in
https://lore.kernel.org/bpf/47ee3265-23f7-2130-ff28-27bfaf3f7877@xxxxxxxxxxxxxxx/
We simply cannot take btf reference if called from btf_parse().
Let us say we move refcount_set(&btf->refcnt, 1) earlier in btf_parse()
so we take ref for btf during btf_parse_fields(), then we have
      btf_put <=== expect refcount == 0 to start the destruction process
        ...
          btf_record_free <=== in which if graph_root, a btf reference will be hold
so btf_put will never be able to actually free btf data.

ah. There is a loop like btf->struct_meta_tab->...btf.

Yes, the kasan problem will be resolved but we leak memory.


It is also unnecessary to take a reference since the value_rec is
referring to a record in struct_meta_tab.

If we optimize for not taking a refcnt, how about not taking a refcnt for all cases and postpone the btf_put(), instead of taking refcnt in one case but not another. Like your fix in v1. The failed selftest can be changed or even removed if it does not make sense anymore.

After a couple of iterations, I think taking necessary reference approach sounds better and this will be consistent with how kptr is handled. For kptr, btf_parse will ignore it.

Got it. It is why kptr.btf got away with the loop.

On the other hand, am I reading it correctly that kptr.btf only needs to take the refcnt for btf that is btf_is_kernel()?

No. besides vmlinux and module btf, it also takes reference for prog btf, see

static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
                           struct btf_field_info *info)
{
...
         if (id == -ENOENT) {
                 /* btf_parse_kptr should only be called w/ btf = program BTF */
                 WARN_ON_ONCE(btf_is_kernel(btf));
                 /* Type exists only in program BTF. Assume that it's a MEM_ALLOC
                  * kptr allocated via bpf_obj_new
                  */
                 field->kptr.dtor = NULL;
                 id = info->kptr.type_id;
                 kptr_btf = (struct btf *)btf;
                 btf_get(kptr_btf);

I meant only kernel/module btf needs to take the refcnt, so there is no need to take the refcnt here for the (it)self btf. Sorry that I was not clear in my earlier comment.

The record is capturing something either in the self btf or something in the kernel btf. The field->kptr.kptr is the one that may either point to a kernel or self btf, so it should be the only case that needs to check the following in btf_record_free():

	if (btf_is_kernel(rec->fields[i].kptr.btf))
		btf_put(rec->fields[i].kptr.btf);

All other cases the record has a self btf (including field->graph_root.btf). The owner (map here) needs to ensure the self btf is freed after the record is freed.

I was thinking if it can avoid doing different things based on where btf_parse_fields() is called by separating what type of btf always needs refcnt or not. Agree the approach in this patch will fix the issue also and I have acked v5. Thanks for the fix.

                 goto found_dtor;
         }
...
}


Unfortunately, for graph_root (list_head, rb_root), btf_parse and map_check will both
process it and that adds a little bit complexity.
Alexei also suggested the same taking reference approach:
https://lore.kernel.org/bpf/CAADnVQL+uc6VV65_Ezgzw3WH=ME9z1Fdy8Pd6xd0oOq8rgwh7g@xxxxxxxxxxxxxx/






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux