On Thu, Feb 10, 2022 at 12:36:08AM IST, Yonghong Song wrote: > > > On 2/8/22 11:03 PM, Kumar Kartikeya Dwivedi wrote: > > When both bpf_spin_lock and bpf_timer are present in a BPF map value, > > copy_map_value needs to skirt both objects when copying a value into and > > out of the map. However, the current code does not set both s_off and > > t_off in copy_map_value, which leads to a crash when e.g. bpf_spin_lock > > is placed in map value with bpf_timer, as bpf_map_update_elem call will > > be able to overwrite the other timer object. > > > > When the issue is not fixed, an overwriting can produce the following > > splat: > > > > [root@(none) bpf]# ./test_progs -t timer_crash > > [ 15.930339] bpf_testmod: loading out-of-tree module taints kernel. > > [ 16.037849] ================================================================== > > [ 16.038458] BUG: KASAN: user-memory-access in __pv_queued_spin_lock_slowpath+0x32b/0x520 > > [ 16.038944] Write of size 8 at addr 0000000000043ec0 by task test_progs/325 > > [ 16.039399] > > [ 16.039514] CPU: 0 PID: 325 Comm: test_progs Tainted: G OE 5.16.0+ #278 > > [ 16.039983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014 > > [ 16.040485] Call Trace: > > [ 16.040645] <TASK> > > [ 16.040805] dump_stack_lvl+0x59/0x73 > > [ 16.041069] ? __pv_queued_spin_lock_slowpath+0x32b/0x520 > > [ 16.041427] kasan_report.cold+0x116/0x11b > > [ 16.041673] ? __pv_queued_spin_lock_slowpath+0x32b/0x520 > > [ 16.042040] __pv_queued_spin_lock_slowpath+0x32b/0x520 > > [ 16.042328] ? memcpy+0x39/0x60 > > [ 16.042552] ? pv_hash+0xd0/0xd0 > > [ 16.042785] ? lockdep_hardirqs_off+0x95/0xd0 > > [ 16.043079] __bpf_spin_lock_irqsave+0xdf/0xf0 > > [ 16.043366] ? bpf_get_current_comm+0x50/0x50 > > [ 16.043608] ? jhash+0x11a/0x270 > > [ 16.043848] bpf_timer_cancel+0x34/0xe0 > > [ 16.044119] bpf_prog_c4ea1c0f7449940d_sys_enter+0x7c/0x81 > > [ 16.044500] bpf_trampoline_6442477838_0+0x36/0x1000 > > [ 16.044836] __x64_sys_nanosleep+0x5/0x140 > > [ 16.045119] do_syscall_64+0x59/0x80 > > [ 16.045377] ? lock_is_held_type+0xe4/0x140 > > [ 16.045670] ? irqentry_exit_to_user_mode+0xa/0x40 > > [ 16.046001] ? mark_held_locks+0x24/0x90 > > [ 16.046287] ? asm_exc_page_fault+0x1e/0x30 > > [ 16.046569] ? asm_exc_page_fault+0x8/0x30 > > [ 16.046851] ? lockdep_hardirqs_on+0x7e/0x100 > > [ 16.047137] entry_SYSCALL_64_after_hwframe+0x44/0xae > > [ 16.047405] RIP: 0033:0x7f9e4831718d > > [ 16.047602] Code: b4 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b3 6c 0c 00 f7 d8 64 89 01 48 > > [ 16.048764] RSP: 002b:00007fff488086b8 EFLAGS: 00000206 ORIG_RAX: 0000000000000023 > > [ 16.049275] RAX: ffffffffffffffda RBX: 00007f9e48683740 RCX: 00007f9e4831718d > > [ 16.049747] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007fff488086d0 > > [ 16.050225] RBP: 00007fff488086f0 R08: 00007fff488085d7 R09: 00007f9e4cb594a0 > > [ 16.050648] R10: 0000000000000000 R11: 0000000000000206 R12: 00007f9e484cde30 > > [ 16.051124] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > > [ 16.051608] </TASK> > > [ 16.051762] ================================================================== > > > > Fixes: 68134668c17f ("bpf: Add map side support for bpf timers.") > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> > > --- > > include/linux/bpf.h | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > index fa517ae604ad..31a83449808b 100644 > > --- a/include/linux/bpf.h > > +++ b/include/linux/bpf.h > > @@ -224,7 +224,8 @@ static inline void copy_map_value(struct bpf_map *map, void *dst, void *src) > > if (unlikely(map_value_has_spin_lock(map))) { > > s_off = map->spin_lock_off; > > s_sz = sizeof(struct bpf_spin_lock); > > - } else if (unlikely(map_value_has_timer(map))) { > > + } > > + if (unlikely(map_value_has_timer(map))) { > > t_off = map->timer_off; > > t_sz = sizeof(struct bpf_timer); > > } > > Thanks for the patch. I think we have a bigger problem here with the patch. > It actually exposed a few kernel bugs. If you run current selftests, esp. > ./test_progs -j which is what I tried, you will observe > various testing failures. The reason is due to we preserved the timer or > spin lock information incorrectly for a map value. > > For example, the selftest #179 (timer) will fail with this patch and > the following change can fix it. > I actually only saw the same failures (on bpf/master) as in CI, and it seems they are there even when I do a run without my patch (related to uprobes). The bpftool patch PR in GitHub also has the same error, so I'm guessing it is unrelated to this. I also didn't see any difference when running on bpf-next. As far as others are concerned, I didn't see the failure for timer test, or any other ones, for me all timer tests pass properly after applying it. It could be that my test VM is not triggering it, because it may depend on the runtime system/memory values, etc. Can you share what error you see? Does it crash or does it just fail? > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > index d29af9988f37..3336d76cc5a6 100644 > --- a/kernel/bpf/hashtab.c > +++ b/kernel/bpf/hashtab.c > @@ -961,10 +961,11 @@ static struct htab_elem *alloc_htab_elem(struct > bpf_htab *htab, void *key, > l_new = ERR_PTR(-ENOMEM); > goto dec_count; > } > - check_and_init_map_value(&htab->map, > - l_new->key + round_up(key_size, > 8)); > } > > + check_and_init_map_value(&htab->map, > + l_new->key + round_up(key_size, 8)); > + Makes sense, but trying to understand why it would fail: So this is needed because the reused element from per-CPU region might have garbage in the bpf_spin_lock/bpf_timer fields? But I think atleast for timer case, we reset timer->timer to NULL in bpf_timer_cancel_and_free. Earlier copy_map_value further below in this code would also overwrite the timer part (which usually may be zero), but that would also not happen anymore. > memcpy(l_new->key, key, key_size); > if (percpu) { > size = round_up(size, 8); > > Basically, the above clears new map value timer/spin_lock fields for *all* > new items. > > There are still some selftest failures due to this patch. > Could you run selftest, with necessary additional fixes, to ensure > all selftests passed? Assuming that uprobe ones are unrelated, should I add this diff and respin? Or are you seeing other errors as well apart from the timer (#179) test? -- Kartikeya