Re: bpf-next hang+kasan uaf refcount acquire splat when running test_progs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2023-04-20 at 14:52 +0200, Florian Westphal wrote:
> Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
> > > BUG: KASAN: slab-out-of-bounds in bpf_refcount_acquire_impl+0x63/0xd0
> > > Write of size 4 at addr ffff8881072b34e8 by task new_name/12847
> > > 
> > > CPU: 1 PID: 12847 Comm: new_name Not tainted 6.3.0-rc6+ #129
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
> > > Call Trace:
> > >  <TASK>
> > >  dump_stack_lvl+0x32/0x40
> > >  print_address_description.constprop.0+0x2b/0x3d0
> > >  ? bpf_refcount_acquire_impl+0x63/0xd0
> > >  print_report+0xb0/0x260
> > >  ? bpf_refcount_acquire_impl+0x63/0xd0
> > >  ? kasan_addr_to_slab+0x9/0x70
> > >  ? bpf_refcount_acquire_impl+0x63/0xd0
> > >  kasan_report+0xad/0xd0
> > >  ? bpf_refcount_acquire_impl+0x63/0xd0
> > >  kasan_check_range+0x13b/0x190
> > >  bpf_refcount_acquire_impl+0x63/0xd0
> > >  bpf_prog_affcc6c9d58016ca___insert_in_tree_and_list+0x54/0x131
> > >  bpf_prog_795203cdef4805f4_insert_and_remove_tree_true_list_true+0x15/0x11b
> > >  bpf_test_run+0x2a0/0x5f0
> > >  ? bpf_test_timer_continue+0x430/0x430
> > >  ? kmem_cache_alloc+0xe5/0x260
> > >  ? kasan_set_track+0x21/0x30
> > >  ? krealloc+0x9e/0xe0
> > >  bpf_prog_test_run_skb+0x890/0x1270
> > >  ? __kmem_cache_free+0x9f/0x170
> > >  ? bpf_prog_test_run_raw_tp+0x570/0x570
> > >  ? __fget_light+0x52/0x4d0
> > >  ? map_update_elem+0x680/0x680
> > >  __sys_bpf+0x75e/0xd10
> > >  ? bpf_link_by_id+0xa0/0xa0
> > >  ? rseq_get_rseq_cs+0x67/0x650
> > >  ? __blkcg_punt_bio_submit+0x1b0/0x1b0
> > >  __x64_sys_bpf+0x6f/0xb0
> > >  do_syscall_64+0x3a/0x80
> > >  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > RIP: 0033:0x7f2f6a8b392d
> > > Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d3 e4 0c 00 f7 d8 64 89 01 48
> > > RSP: 002b:00007ffe46938328 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
> > > RAX: ffffffffffffffda RBX: 0000000007150690 RCX: 00007f2f6a8b392d
> > > RDX: 0000000000000050 RSI: 00007ffe46938360 RDI: 000000000000000a
> > > RBP: 00007ffe46938340 R08: 0000000000000064 R09: 00007ffe46938360
> > > R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> > > R13: 00007ffe46938b78 R14: 0000000000e09db0 R15: 00007f2f6a9ff000
> > >  </TASK>
> > > 
> > > I can also reproduce this on bpf-next/780c69830ec6b27e0224586ce26bc89552fcf163.
> > > Is this a known bug?
> > 
> > Hi Florian,
> > 
> > Thank you for the report, that's my bug :(
> > 
> > After the suggested change I can run the ./test_progs till the end
> >  (takes a few minutes, though). One test is failing: verifier_array_access,
> > this is because map it uses is not populated with values (as it was when this was
> > a part ./test_verifier).
> 
> Right, I see that failure too.
> > However, in the middle of execution I do see a trace similar to yours:
> 
> I see this as well, to get to it quicker:
> ./test_progs --allow=refcounted_kptr
> 
> 
> > [   82.757127] ------------[ cut here ]------------
> > [   82.757667] refcount_t: saturated; leaking memory.
> > [   82.758098] WARNING: CPU: 0 PID: 176 at lib/refcount.c:22 refcount_warn_saturate+0x61/0xe0
> 
> I get this one right after the kasan splat.
> 
> > Could you please share your config?
> > I'd like to reproduce the hang.
> 
> It hangs for me if I just rerun 
> ./test_progs --allow=refcounted_kptr
> 
> a couple of times (maybe once per cpu...?).
> 
> I'll send the config if that doesn't hang for you.

Ok, I got the hang after executing the following couple of times:

for i in $(seq 1 4); do (./test_progs --allow=refcounted_kptr &); done

And here is a dead lock warning in the dmesg:

[ 1200.463933] ============================================
[ 1200.464382] WARNING: possible recursive locking detected
[ 1200.464546] 6.3.0-rc6-01631-g780c69830ec6 #474 Tainted: G        W  OE     
[ 1200.464546] --------------------------------------------
[ 1200.464546] test_progs/4311 is trying to acquire lock:
[ 1200.464546] ffffffff8276f518 (btf_idr_lock){+.-.}-{2:2}, at: btf_put+0x36/0x80
[ 1200.464546] 
[ 1200.464546] but task is already holding lock:
[ 1200.464546] ffffffff8276f518 (btf_idr_lock){+.-.}-{2:2}, at: bpf_find_btf_id+0xeb/0x1b0
[ 1200.464546] 
[ 1200.464546] other info that might help us debug this:
[ 1200.464546]  Possible unsafe locking scenario:
[ 1200.464546] 
[ 1200.464546]        CPU0
[ 1200.464546]        ----
[ 1200.464546]   lock(btf_idr_lock);
[ 1200.464546]   lock(btf_idr_lock);
[ 1200.464546] 
[ 1200.464546]  *** DEADLOCK ***
[ 1200.464546] 
[ 1200.464546]  May be due to missing lock nesting notation
[ 1200.464546] 
[ 1200.464546] 1 lock held by test_progs/4311:
[ 1200.464546]  #0: ffffffff8276f518 (btf_idr_lock){+.-.}-{2:2}, at: bpf_find_btf_id+0xeb/0x1b0
[ 1200.464546] 
[ 1200.464546] stack backtrace:
[ 1200.464546] CPU: 2 PID: 4311 Comm: test_progs Tainted: G        W  OE      6.3.0-rc6-01631-g780c69830ec6 #474
[ 1200.464546] Call Trace:
[ 1200.464546]  <TASK>
[ 1200.464546]  dump_stack_lvl+0x47/0x70
[ 1200.464546]  __lock_acquire+0x88b/0x2710
[ 1200.464546]  ? __lock_acquire+0x350/0x2710
[ 1200.464546]  lock_acquire+0xca/0x2c0
[ 1200.464546]  ? btf_put+0x36/0x80
[ 1200.464546]  ? lock_acquire+0xda/0x2c0
[ 1200.464546]  _raw_spin_lock_irqsave+0x3e/0x60
[ 1200.464546]  ? btf_put+0x36/0x80
[ 1200.464546]  btf_put+0x36/0x80
[ 1200.464546]  bpf_find_btf_id+0xf3/0x1b0
[ 1200.464546]  btf_parse_fields+0x570/0xbf0
[ 1200.464546]  ? lock_release+0x139/0x280
[ 1200.464546]  ? __bpf_map_area_alloc+0xaa/0xd0
[ 1200.464546]  ? __kmem_cache_alloc_node+0x14a/0x220
[ 1200.464546]  ? rcu_is_watching+0xd/0x40
[ 1200.464546]  ? __kmalloc_node+0xcb/0x140
[ 1200.464546]  map_check_btf+0x9c/0x260
[ 1200.464546]  __sys_bpf+0x274b/0x2ca0
[ 1200.464546]  ? lock_release+0x139/0x280
[ 1200.464546]  __x64_sys_bpf+0x1a/0x20
[ 1200.464546]  do_syscall_64+0x35/0x80
[ 1200.464546]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 1200.464546] RIP: 0033:0x7f659b7a15a9
[ 1200.464546] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 08 0d 08
[ 1200.464546] RSP: 002b:00007ffe984a7e88 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
[ 1200.464546] RAX: ffffffffffffffda RBX: 000055ebdcfae150 RCX: 00007f659b7a15a9
[ 1200.464546] RDX: 0000000000000048 RSI: 00007ffe984a7f10 RDI: 0000000000000000
[ 1200.464546] RBP: 00007ffe984a7ea0 R08: 00007ffe984a8000 R09: 00007ffe984a7f10
[ 1200.464546] R10: 0000000000000001 R11: 0000000000000206 R12: 0000000000000000
[ 1200.464546] R13: 00007ffe984a8740 R14: 000055ebd91894f0 R15: 00007f659b8f0020
[ 1200.464546]  </TASK>
[ 1224.297533] watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [test_progs:4315]
[ 1224.297533] Modules linked in: [last unloaded: bpf_testmod(OE)]

The refcounted_kptr does use __retval annotations.
So, we have a testing framework issue that masks BTF parsing issue...





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux