Re: [PATCH bpf-next v4] bpf: Fix a race condition between btf_put() and map_free()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 8, 2023 at 9:07 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
>
>
> On 12/8/23 12:30 AM, Hou Tao wrote:
> > Hi,
> >
> > On 12/8/2023 12:02 PM, Yonghong Song wrote:
> >> On 12/7/23 7:59 PM, Yonghong Song wrote:
> >>> On 12/7/23 5:23 PM, Martin KaFai Lau wrote:
> >>>> On 12/6/23 1:09 PM, Yonghong Song wrote:
> >>>>> When running `./test_progs -j` in my local vm with latest kernel,
> >>>>> I once hit a kasan error like below:
> >>>>>
> >>>>>
> > SNIP
> >>>>> Here, 'value_rec' is assigned in btf_check_and_fixup_fields() with
> >>>>> following code:
> >>>>>
> >>>>>     meta = btf_find_struct_meta(btf, btf_id);
> >>>>>     if (!meta)
> >>>>>       return -EFAULT;
> >>>>>     rec->fields[i].graph_root.value_rec = meta->record;
> >>>>>
> >>>>> So basically, 'value_rec' is a pointer to the record in
> >>>>> struct_metas_tab.
> >>>>> And it is possible that that particular record has been freed by
> >>>>> btf_struct_metas_free() and hence we have a kasan error here.
> >>>>>
> >>>>> Actually it is very hard to reproduce the failure with current
> >>>>> bpf/bpf-next
> >>>>> code, I only got the above error once. To increase reproducibility,
> >>>>> I added
> >>>>> a delay in bpf_map_free_deferred() to delay map->ops->map_free(),
> >>>>> which
> >>>>> significantly increased reproducibility.
> >>>>>
> >>>>>     diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> >>>>>     index 5e43ddd1b83f..aae5b5213e93 100644
> >>>>>     --- a/kernel/bpf/syscall.c
> >>>>>     +++ b/kernel/bpf/syscall.c
> >>>>>     @@ -695,6 +695,7 @@ static void bpf_map_free_deferred(struct
> >>>>> work_struct *work)
> >>>>>           struct bpf_map *map = container_of(work, struct bpf_map,
> >>>>> work);
> >>>>>           struct btf_record *rec = map->record;
> >>>>>
> >>>>>     +     mdelay(100);
> >>>>>           security_bpf_map_free(map);
> >>>>>           bpf_map_release_memcg(map);
> >>>>>           /* implementation dependent freeing */
> >>>>>
> >>>>> To fix the problem, we need to have a reference on btf in order to
> >>>>> safeguard accessing field->graph_root.value_rec in
> >>>>> map->ops->map_free().
> >>>>> The function btf_parse_graph_root() is the place to get a btf
> >>>>> reference.
> >>>>> The following are rough call stacks reaching bpf_parse_graph_root():
> >>>>>
> >>>>>      btf_parse
> >>>>>        ...
> >>>>>          btf_parse_fields
> >>>>>            ...
> >>>>>              btf_parse_graph_root
> >>>>>
> >>>>>      map_check_btf
> >>>>>        btf_parse_fields
> >>>>>          ...
> >>>>>            btf_parse_graph_root
> >>>>>
> >>>>> Looking at the above call stack, the btf_parse_graph_root() is
> >>>>> indirectly
> >>>>> called from btf_parse() or map_check_btf().
> >>>>>
> >>>>> We cannot take a reference in btf_parse() case since at that moment,
> >>>>> btf is still in the middle to self-validation and initial reference
> >>>>> (refcount_set(&btf->refcnt, 1)) has not been triggered yet.
> >>>> Thanks for the details analysis and clear explanation. It helps a lot.
> >>>>
> >>>> Sorry for jumping in late.
> >>>>
> >>>> I am trying to avoid making a special case for "bool has_btf_ref;"
> >>>> and "bool from_map_check". It seems to a bit too much to deal with
> >>>> the error path for btf_parse().
> > Maybe we could move the common btf used by kptr and graph_root into
> > bpf_record and let the callers of btf_parse_fields()  and
> > btf_record_free() to decide the life cycle of btf in btf_record, so
> > there will be less intrusive and less special case. The following is the
>
> I didn't fully check the code but looks like we took a
> btf reference at map_check_btf() and free it at the end
> of bpf_map_free_deferred(). This is similar to my v1 patch,
> not exactly the same but similar since they all do
> btf_put() at the end of bpf_map_free_deferred().
>
> Through discussion, doing on-demand btf_get()/btf_put()
> approach, similar to kptr approach, seems more favored.
> This also has advantage to free btf at its earlist possible
> point.

Sorry. Looks like I recommended the wrong path.

The approach of btf_parse_fields(... false | true)
depending on where it's called and whether returned struct btf_record *
will be kept within a type or within a map
is pushing complexity too far.
A year from now we'll forget these subtle details.
There is an advantage to do btf_put() earli in bpf_map_put(),
but in the common case it would be delayed just after queue_work.
Which is a minor time delay.
And for free_after_mult_rcu_gp much longer,
but saving from freeing btf are minor compared to the map itself.

I think it's cleaner to go back to v1 and simply move btf_put
to bpf_map_free_deferred().
A lot less things to worry about.
Especially considering that BPF_RB_ROOT may not be the last such special
record keeping type and every new type would need to think
hard whether it's BPF_RB_ROOT-like or BPF_LIST_NODE-like.
v1 avoids this future complexity.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux