Re: [PATCH bpf] bpf: Fix a race condition between btf_put() and map_free()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 12/5/2023 8:42 AM, Andrii Nakryiko wrote:
> On Mon, Dec 4, 2023 at 9:40 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
>> When running `./test_progs -j` in my local vm with latest kernel,
>> I once hit a kasan error like below:

SNIP
>>
>>
>> So the problem is at rec->refcount_off in the above.
>>
>> I did some source code analysis and find the reason.
>>                                   CPU A                        CPU B
>>   bpf_map_put:
>>     ...
>>     btf_put with rcu callback
>>     ...
>>     bpf_map_free_deferred
>>       with system_unbound_wq
>>     ...                          ...                           ...
>>     ...                          btf_free_rcu:                 ...
>>     ...                          ...                           bpf_map_free_deferred:
>>     ...                          ...
>>     ...         --------->       btf_struct_metas_free()
>>     ...         | race condition ...
>>     ...         --------->                                     map->ops->map_free()
>>     ...
>>     ...                          btf->struct_meta_tab = NULL
>>
>> In the above, map_free() corresponds to array_map_free() and eventually
>> calling bpf_rb_root_free() which calls:
>>   ...
>>   __bpf_obj_drop_impl(obj, field->graph_root.value_rec, false);
>>   ...
>>
>> Here, 'value_rec' is assigned in btf_check_and_fixup_fields() with following code:
>>
>>   meta = btf_find_struct_meta(btf, btf_id);
>>   if (!meta)
>>     return -EFAULT;
>>   rec->fields[i].graph_root.value_rec = meta->record;
>>
>> So basically, 'value_rec' is a pointer to the record in struct_metas_tab.
>> And it is possible that that particular record has been freed by
>> btf_struct_metas_free() and hence we have a kasan error here.
>>
>> Actually it is very hard to reproduce the failure with current bpf/bpf-next
>> code, I only got the above error once. To increase reproducibility, I added
>> a delay in bpf_map_free_deferred() to delay map->ops->map_free(), which
>> significantly increased reproducibility.

Also found the problem when developing the "fix the release of inner
map" patch-set. I have written a selftest which could reliably reproduce
the problem by using map-in-map + bpf_list_head. The reason of using
map-in-map is to delay the release of inner map by using call_rcu() as
well, so the free of bpf_map happens after the release of btf. Will post
it later.
>>
>>   diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>   index 5e43ddd1b83f..aae5b5213e93 100644
>>   --- a/kernel/bpf/syscall.c
>>   +++ b/kernel/bpf/syscall.c
>>   @@ -695,6 +695,7 @@ static void bpf_map_free_deferred(struct work_struct *work)
>>         struct bpf_map *map = container_of(work, struct bpf_map, work);
>>         struct btf_record *rec = map->record;
>>
>>   +     mdelay(100);
>>         security_bpf_map_free(map);
>>         bpf_map_release_memcg(map);
>>         /* implementation dependent freeing */
>>
>> To fix the problem, I moved btf_put() after map->ops->map_free() to ensure
>> struct_metas available during map_free(). Rerun './test_progs -j' with the
>> above mdelay() hack for a couple of times and didn't observe the error.
>>
>> Fixes: 958cf2e273f0 ("bpf: Introduce bpf_obj_new")
>> Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx>
>> ---
>>  kernel/bpf/syscall.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 0ed286b8a0f0..9c6c3738adfe 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -694,11 +694,16 @@ static void bpf_map_free_deferred(struct work_struct *work)
>>  {
>>         struct bpf_map *map = container_of(work, struct bpf_map, work);
>>         struct btf_record *rec = map->record;
>> +       struct btf *btf = map->btf;
>>
>>         security_bpf_map_free(map);
>>         bpf_map_release_memcg(map);
>>         /* implementation dependent freeing */
>>         map->ops->map_free(map);
>> +       /* Delay freeing of btf for maps, as map_free callback may need
>> +        * struct_meta info which will be freed with btf_put().
>> +        */
>> +       btf_put(btf);
> The change makes sense to me, but logically I'd put it after
> btf_record_free(rec), just in case if some of btf records ever refer
> back to map's BTF or something (and just in general to keep it the
> very last thing that's happening).
>
>
> But it also seems like CI is not happy ([0]), please take a look, thanks!
>
>   [0] https://github.com/kernel-patches/bpf/actions/runs/7090474333/job/19297672532

The patch delays the release of BTF id of bpf map, so test_btf_id
failed. Can we fix the problem by optionally pinning the btf in
btf_field_graph_root just like btf_field_kptr, so the map BTF will still
be alive before the invocation of btf_record_free() ? We need to do the
pinning optionally, because btf_record may be contained in btf directly
(namely btf->struct_meta_tab) and is freed through btf_free().
>
>
>>         /* Delay freeing of btf_record for maps, as map_free
>>          * callback usually needs access to them. It is better to do it here
>>          * than require each callback to do the free itself manually.
>> @@ -727,7 +732,6 @@ void bpf_map_put(struct bpf_map *map)
>>         if (atomic64_dec_and_test(&map->refcnt)) {
>>                 /* bpf_map_free_id() must be called first */
>>                 bpf_map_free_id(map);
>> -               btf_put(map->btf);
>>                 INIT_WORK(&map->work, bpf_map_free_deferred);
>>                 /* Avoid spawning kworkers, since they all might contend
>>                  * for the same mutex like slab_mutex.
>> --
>> 2.34.1
>>
> .





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux