[BUG] Possible deadlock in bpf_local_storage_update

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've seen the following deadlock warning when running the test_progs selftest:

[  127.404118] ============================================
[  127.409306] WARNING: possible recursive locking detected
[  127.414534] 5.19.0-rc8-02055-g43fe6c051c85 #257 Tainted: G           OE
[  127.421172] --------------------------------------------
[  127.426256] test_progs/492 is trying to acquire lock:
[  127.431356] ffff8ffe0d6c4bb8 (&storage->lock){+.-.}-{2:2}, at: __bpf_selem_unlink_storage+0x3a/0x150
[  127.440305]
[  127.440305] but task is already holding lock:
[  127.445872] ffff8ffe0d6c4ab8 (&storage->lock){+.-.}-{2:2}, at: bpf_local_storage_update+0x31e/0x490
[  127.454681]
[  127.454681] other info that might help us debug this:
[  127.461171]  Possible unsafe locking scenario:
[  127.461171]
[  127.467377]        CPU0
[  127.469971]        ----
[  127.472497]   lock(&storage->lock);
[  127.475963]   lock(&storage->lock);
[  127.479391]
[  127.479391]  *** DEADLOCK ***
[  127.479391]
[  127.485434]  May be due to missing lock nesting notation
[  127.485434]
[  127.492118] 3 locks held by test_progs/492:
[  127.496484]  #0: ffffffffbaf94b60 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xe0
[  127.505888]  #1: ffff8ffe0d6c4ab8 (&storage->lock){+.-.}-{2:2}, at: bpf_local_storage_update+0x31e/0x490
[  127.514981]  #2: ffffffffbaf957e0 (rcu_read_lock){....}-{1:2}, at: __bpf_prog_enter+0x0/0x100
[  127.523310]
[  127.523310] stack backtrace:
[  127.527574] CPU: 7 PID: 492 Comm: test_progs Tainted: G           OE     5.19.0-rc8-02055-g43fe6c051c85 #257
[  127.536658] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
[  127.547462] Call Trace:
[  127.549977]  <TASK>
[  127.552175]  dump_stack_lvl+0x44/0x5b
[  127.555758]  __lock_acquire.cold.74+0x151/0x2aa
[  127.560217]  lock_acquire+0xc9/0x2f0
[  127.563686]  ? __bpf_selem_unlink_storage+0x3a/0x150
[  127.568524]  ? find_held_lock+0x2d/0xa0
[  127.572378]  _raw_spin_lock_irqsave+0x38/0x60
[  127.576532]  ? __bpf_selem_unlink_storage+0x3a/0x150
[  127.581380]  __bpf_selem_unlink_storage+0x3a/0x150
[  127.586044]  bpf_task_storage_delete+0x53/0xb0
[  127.590385]  bpf_prog_730e33528dbd2937_on_lookup+0x26/0x3d
[  127.595673]  bpf_trampoline_6442505865_0+0x47/0x1000
[  127.600533]  ? bpf_local_storage_update+0x250/0x490
[  127.605253]  bpf_local_storage_lookup+0x5/0x130
[  127.609650]  bpf_local_storage_update+0xf1/0x490
[  127.614175]  bpf_sk_storage_get+0xd3/0x130
[  127.618126]  bpf_prog_b4aaeb10c7178354_socket_bind+0x18e/0x297
[  127.623815]  bpf_trampoline_6442474456_1+0x5c/0x1000
[  127.628591]  bpf_lsm_socket_bind+0x5/0x10
[  127.632476]  security_socket_bind+0x30/0x50
[  127.636755]  __sys_bind+0xba/0xf0
[  127.640113]  ? ktime_get_coarse_real_ts64+0xb9/0xc0
[  127.644910]  ? lockdep_hardirqs_on+0x79/0x100
[  127.649438]  ? ktime_get_coarse_real_ts64+0xb9/0xc0
[  127.654215]  ? syscall_trace_enter.isra.16+0x157/0x200
[  127.659255]  __x64_sys_bind+0x16/0x20
[  127.662894]  do_syscall_64+0x3a/0x90
[  127.666456]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  127.671500] RIP: 0033:0x7fbba4b36ceb
[  127.674982] Code: c3 48 8b 15 77 31 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c2 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 8
[  127.692457] RSP: 002b:00007fff4e9c9db8 EFLAGS: 00000206 ORIG_RAX: 0000000000000031
[  127.699666] RAX: ffffffffffffffda RBX: 00007fbba5057000 RCX: 00007fbba4b36ceb
[  127.706448] RDX: 000000000000001c RSI: 00007fff4e9c9e40 RDI: 0000000000000035
[  127.713247] RBP: 00007fff4e9c9e00 R08: 0000000000000010 R09: 0000000000000000
[  127.719938] R10: 0000000000000000 R11: 0000000000000206 R12: 000000000040d3a0
[  127.726790] R13: 00007fff4e9ce330 R14: 0000000000000000 R15: 0000000000000000
[  127.733820]  </TASK>

I am not entirely sure I am reading the call trace correctly (or whether it
really is all that accurate for that matter), but one way I could see a
recursive acquisition is if we first acquire the local_storage lock in
bpf_local_storage_update [0], then we call into bpf_local_storage_lookup in line
439 (with the lock still held), and then attempt to acquire it again in line
268.

The config I used is tools/testing/selftests/bpf/config +
tools/testing/selftests/bpf/config.x86_64. I am at synced to 71930846b36 ("net:
marvell: prestera: uninitialized variable bug").

Thanks,
Daniel

[0] https://elixir.bootlin.com/linux/v5.19-rc8/source/kernel/bpf/bpf_local_storage.c#L426



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux