Hi, On 12/5/2023 8:42 AM, Andrii Nakryiko wrote: > On Mon, Dec 4, 2023 at 9:40 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: >> When running `./test_progs -j` in my local vm with latest kernel, >> I once hit a kasan error like below: SNIP >> >> >> So the problem is at rec->refcount_off in the above. >> >> I did some source code analysis and find the reason. >> CPU A CPU B >> bpf_map_put: >> ... >> btf_put with rcu callback >> ... >> bpf_map_free_deferred >> with system_unbound_wq >> ... ... ... >> ... btf_free_rcu: ... >> ... ... bpf_map_free_deferred: >> ... ... >> ... ---------> btf_struct_metas_free() >> ... | race condition ... >> ... ---------> map->ops->map_free() >> ... >> ... btf->struct_meta_tab = NULL >> >> In the above, map_free() corresponds to array_map_free() and eventually >> calling bpf_rb_root_free() which calls: >> ... >> __bpf_obj_drop_impl(obj, field->graph_root.value_rec, false); >> ... >> >> Here, 'value_rec' is assigned in btf_check_and_fixup_fields() with following code: >> >> meta = btf_find_struct_meta(btf, btf_id); >> if (!meta) >> return -EFAULT; >> rec->fields[i].graph_root.value_rec = meta->record; >> >> So basically, 'value_rec' is a pointer to the record in struct_metas_tab. >> And it is possible that that particular record has been freed by >> btf_struct_metas_free() and hence we have a kasan error here. >> >> Actually it is very hard to reproduce the failure with current bpf/bpf-next >> code, I only got the above error once. To increase reproducibility, I added >> a delay in bpf_map_free_deferred() to delay map->ops->map_free(), which >> significantly increased reproducibility. Also found the problem when developing the "fix the release of inner map" patch-set. I have written a selftest which could reliably reproduce the problem by using map-in-map + bpf_list_head. The reason of using map-in-map is to delay the release of inner map by using call_rcu() as well, so the free of bpf_map happens after the release of btf. Will post it later. >> >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >> index 5e43ddd1b83f..aae5b5213e93 100644 >> --- a/kernel/bpf/syscall.c >> +++ b/kernel/bpf/syscall.c >> @@ -695,6 +695,7 @@ static void bpf_map_free_deferred(struct work_struct *work) >> struct bpf_map *map = container_of(work, struct bpf_map, work); >> struct btf_record *rec = map->record; >> >> + mdelay(100); >> security_bpf_map_free(map); >> bpf_map_release_memcg(map); >> /* implementation dependent freeing */ >> >> To fix the problem, I moved btf_put() after map->ops->map_free() to ensure >> struct_metas available during map_free(). Rerun './test_progs -j' with the >> above mdelay() hack for a couple of times and didn't observe the error. >> >> Fixes: 958cf2e273f0 ("bpf: Introduce bpf_obj_new") >> Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx> >> --- >> kernel/bpf/syscall.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >> index 0ed286b8a0f0..9c6c3738adfe 100644 >> --- a/kernel/bpf/syscall.c >> +++ b/kernel/bpf/syscall.c >> @@ -694,11 +694,16 @@ static void bpf_map_free_deferred(struct work_struct *work) >> { >> struct bpf_map *map = container_of(work, struct bpf_map, work); >> struct btf_record *rec = map->record; >> + struct btf *btf = map->btf; >> >> security_bpf_map_free(map); >> bpf_map_release_memcg(map); >> /* implementation dependent freeing */ >> map->ops->map_free(map); >> + /* Delay freeing of btf for maps, as map_free callback may need >> + * struct_meta info which will be freed with btf_put(). >> + */ >> + btf_put(btf); > The change makes sense to me, but logically I'd put it after > btf_record_free(rec), just in case if some of btf records ever refer > back to map's BTF or something (and just in general to keep it the > very last thing that's happening). > > > But it also seems like CI is not happy ([0]), please take a look, thanks! > > [0] https://github.com/kernel-patches/bpf/actions/runs/7090474333/job/19297672532 The patch delays the release of BTF id of bpf map, so test_btf_id failed. Can we fix the problem by optionally pinning the btf in btf_field_graph_root just like btf_field_kptr, so the map BTF will still be alive before the invocation of btf_record_free() ? We need to do the pinning optionally, because btf_record may be contained in btf directly (namely btf->struct_meta_tab) and is freed through btf_free(). > > >> /* Delay freeing of btf_record for maps, as map_free >> * callback usually needs access to them. It is better to do it here >> * than require each callback to do the free itself manually. >> @@ -727,7 +732,6 @@ void bpf_map_put(struct bpf_map *map) >> if (atomic64_dec_and_test(&map->refcnt)) { >> /* bpf_map_free_id() must be called first */ >> bpf_map_free_id(map); >> - btf_put(map->btf); >> INIT_WORK(&map->work, bpf_map_free_deferred); >> /* Avoid spawning kworkers, since they all might contend >> * for the same mutex like slab_mutex. >> -- >> 2.34.1 >> > .