On Mon, Dec 4, 2023 at 9:40 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > When running `./test_progs -j` in my local vm with latest kernel, > I once hit a kasan error like below: > > [ 1887.184724] BUG: KASAN: slab-use-after-free in bpf_rb_root_free+0x1f8/0x2b0 > [ 1887.185599] Read of size 4 at addr ffff888106806910 by task kworker/u12:2/2830 > [ 1887.186498] > [ 1887.186712] CPU: 3 PID: 2830 Comm: kworker/u12:2 Tainted: G OEL 6.7.0-rc3-00699-g90679706d486-dirty #494 > [ 1887.188034] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 > [ 1887.189618] Workqueue: events_unbound bpf_map_free_deferred > [ 1887.190341] Call Trace: > [ 1887.190666] <TASK> > [ 1887.190949] dump_stack_lvl+0xac/0xe0 > [ 1887.191423] ? nf_tcp_handle_invalid+0x1b0/0x1b0 > [ 1887.192019] ? panic+0x3c0/0x3c0 > [ 1887.192449] print_report+0x14f/0x720 > [ 1887.192930] ? preempt_count_sub+0x1c/0xd0 > [ 1887.193459] ? __virt_addr_valid+0xac/0x120 > [ 1887.194004] ? bpf_rb_root_free+0x1f8/0x2b0 > [ 1887.194572] kasan_report+0xc3/0x100 > [ 1887.195085] ? bpf_rb_root_free+0x1f8/0x2b0 > [ 1887.195668] bpf_rb_root_free+0x1f8/0x2b0 > [ 1887.196183] ? __bpf_obj_drop_impl+0xb0/0xb0 > [ 1887.196736] ? preempt_count_sub+0x1c/0xd0 > [ 1887.197270] ? preempt_count_sub+0x1c/0xd0 > [ 1887.197802] ? _raw_spin_unlock+0x1f/0x40 > [ 1887.198319] bpf_obj_free_fields+0x1d4/0x260 > [ 1887.198883] array_map_free+0x1a3/0x260 > [ 1887.199380] bpf_map_free_deferred+0x7b/0xe0 > [ 1887.199943] process_scheduled_works+0x3a2/0x6c0 > [ 1887.200549] worker_thread+0x633/0x890 > [ 1887.201047] ? __kthread_parkme+0xd7/0xf0 > [ 1887.201574] ? kthread+0x102/0x1d0 > [ 1887.202020] kthread+0x1ab/0x1d0 > [ 1887.202447] ? pr_cont_work+0x270/0x270 > [ 1887.202954] ? kthread_blkcg+0x50/0x50 > [ 1887.203444] ret_from_fork+0x34/0x50 > [ 1887.203914] ? kthread_blkcg+0x50/0x50 > [ 1887.204397] ret_from_fork_asm+0x11/0x20 > [ 1887.204913] </TASK> > [ 1887.204913] </TASK> > [ 1887.205209] > [ 1887.205416] Allocated by task 2197: > [ 1887.205881] kasan_set_track+0x3f/0x60 > [ 1887.206366] __kasan_kmalloc+0x6e/0x80 > [ 1887.206856] __kmalloc+0xac/0x1a0 > [ 1887.207293] btf_parse_fields+0xa15/0x1480 > [ 1887.207836] btf_parse_struct_metas+0x566/0x670 > [ 1887.208387] btf_new_fd+0x294/0x4d0 > [ 1887.208851] __sys_bpf+0x4ba/0x600 > [ 1887.209292] __x64_sys_bpf+0x41/0x50 > [ 1887.209762] do_syscall_64+0x4c/0xf0 > [ 1887.210222] entry_SYSCALL_64_after_hwframe+0x63/0x6b > [ 1887.210868] > [ 1887.211074] Freed by task 36: > [ 1887.211460] kasan_set_track+0x3f/0x60 > [ 1887.211951] kasan_save_free_info+0x28/0x40 > [ 1887.212485] ____kasan_slab_free+0x101/0x180 > [ 1887.213027] __kmem_cache_free+0xe4/0x210 > [ 1887.213514] btf_free+0x5b/0x130 > [ 1887.213918] rcu_core+0x638/0xcc0 > [ 1887.214347] __do_softirq+0x114/0x37e > > The error happens at bpf_rb_root_free+0x1f8/0x2b0: > > 00000000000034c0 <bpf_rb_root_free>: > ; { > 34c0: f3 0f 1e fa endbr64 > 34c4: e8 00 00 00 00 callq 0x34c9 <bpf_rb_root_free+0x9> > 34c9: 55 pushq %rbp > 34ca: 48 89 e5 movq %rsp, %rbp > ... > ; if (rec && rec->refcount_off >= 0 && > 36aa: 4d 85 ed testq %r13, %r13 > 36ad: 74 a9 je 0x3658 <bpf_rb_root_free+0x198> > 36af: 49 8d 7d 10 leaq 0x10(%r13), %rdi > 36b3: e8 00 00 00 00 callq 0x36b8 <bpf_rb_root_free+0x1f8> > <==== kasan function > 36b8: 45 8b 7d 10 movl 0x10(%r13), %r15d > <==== use-after-free load > 36bc: 45 85 ff testl %r15d, %r15d > 36bf: 78 8c js 0x364d <bpf_rb_root_free+0x18d> > > So the problem is at rec->refcount_off in the above. > > I did some source code analysis and find the reason. > CPU A CPU B > bpf_map_put: > ... > btf_put with rcu callback > ... > bpf_map_free_deferred > with system_unbound_wq > ... ... ... > ... btf_free_rcu: ... > ... ... bpf_map_free_deferred: > ... ... > ... ---------> btf_struct_metas_free() > ... | race condition ... > ... ---------> map->ops->map_free() > ... > ... btf->struct_meta_tab = NULL > > In the above, map_free() corresponds to array_map_free() and eventually > calling bpf_rb_root_free() which calls: > ... > __bpf_obj_drop_impl(obj, field->graph_root.value_rec, false); > ... > > Here, 'value_rec' is assigned in btf_check_and_fixup_fields() with following code: > > meta = btf_find_struct_meta(btf, btf_id); > if (!meta) > return -EFAULT; > rec->fields[i].graph_root.value_rec = meta->record; > > So basically, 'value_rec' is a pointer to the record in struct_metas_tab. > And it is possible that that particular record has been freed by > btf_struct_metas_free() and hence we have a kasan error here. > > Actually it is very hard to reproduce the failure with current bpf/bpf-next > code, I only got the above error once. To increase reproducibility, I added > a delay in bpf_map_free_deferred() to delay map->ops->map_free(), which > significantly increased reproducibility. > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 5e43ddd1b83f..aae5b5213e93 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -695,6 +695,7 @@ static void bpf_map_free_deferred(struct work_struct *work) > struct bpf_map *map = container_of(work, struct bpf_map, work); > struct btf_record *rec = map->record; > > + mdelay(100); > security_bpf_map_free(map); > bpf_map_release_memcg(map); > /* implementation dependent freeing */ > > To fix the problem, I moved btf_put() after map->ops->map_free() to ensure > struct_metas available during map_free(). Rerun './test_progs -j' with the > above mdelay() hack for a couple of times and didn't observe the error. > > Fixes: 958cf2e273f0 ("bpf: Introduce bpf_obj_new") > Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx> > --- > kernel/bpf/syscall.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 0ed286b8a0f0..9c6c3738adfe 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -694,11 +694,16 @@ static void bpf_map_free_deferred(struct work_struct *work) > { > struct bpf_map *map = container_of(work, struct bpf_map, work); > struct btf_record *rec = map->record; > + struct btf *btf = map->btf; > > security_bpf_map_free(map); > bpf_map_release_memcg(map); > /* implementation dependent freeing */ > map->ops->map_free(map); > + /* Delay freeing of btf for maps, as map_free callback may need > + * struct_meta info which will be freed with btf_put(). > + */ > + btf_put(btf); The change makes sense to me, but logically I'd put it after btf_record_free(rec), just in case if some of btf records ever refer back to map's BTF or something (and just in general to keep it the very last thing that's happening). But it also seems like CI is not happy ([0]), please take a look, thanks! [0] https://github.com/kernel-patches/bpf/actions/runs/7090474333/job/19297672532 > /* Delay freeing of btf_record for maps, as map_free > * callback usually needs access to them. It is better to do it here > * than require each callback to do the free itself manually. > @@ -727,7 +732,6 @@ void bpf_map_put(struct bpf_map *map) > if (atomic64_dec_and_test(&map->refcnt)) { > /* bpf_map_free_id() must be called first */ > bpf_map_free_id(map); > - btf_put(map->btf); > INIT_WORK(&map->work, bpf_map_free_deferred); > /* Avoid spawning kworkers, since they all might contend > * for the same mutex like slab_mutex. > -- > 2.34.1 >