On Tue, Aug 02, 2022 at 10:25:06PM +0000, Daniel Müller wrote: > Hi, > > I've seen the following deadlock warning when running the test_progs selftest: > > [ 127.404118] ============================================ > [ 127.409306] WARNING: possible recursive locking detected > [ 127.414534] 5.19.0-rc8-02055-g43fe6c051c85 #257 Tainted: G OE > [ 127.421172] -------------------------------------------- > [ 127.426256] test_progs/492 is trying to acquire lock: > [ 127.431356] ffff8ffe0d6c4bb8 (&storage->lock){+.-.}-{2:2}, at: __bpf_selem_unlink_storage+0x3a/0x150 > [ 127.440305] > [ 127.440305] but task is already holding lock: > [ 127.445872] ffff8ffe0d6c4ab8 (&storage->lock){+.-.}-{2:2}, at: bpf_local_storage_update+0x31e/0x490 > [ 127.454681] > [ 127.454681] other info that might help us debug this: > [ 127.461171] Possible unsafe locking scenario: > [ 127.461171] > [ 127.467377] CPU0 > [ 127.469971] ---- > [ 127.472497] lock(&storage->lock); > [ 127.475963] lock(&storage->lock); > [ 127.479391] > [ 127.479391] *** DEADLOCK *** > [ 127.479391] > [ 127.485434] May be due to missing lock nesting notation > [ 127.485434] > [ 127.492118] 3 locks held by test_progs/492: > [ 127.496484] #0: ffffffffbaf94b60 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xe0 > [ 127.505888] #1: ffff8ffe0d6c4ab8 (&storage->lock){+.-.}-{2:2}, at: bpf_local_storage_update+0x31e/0x490 > [ 127.514981] #2: ffffffffbaf957e0 (rcu_read_lock){....}-{1:2}, at: __bpf_prog_enter+0x0/0x100 > [ 127.523310] > [ 127.523310] stack backtrace: > [ 127.527574] CPU: 7 PID: 492 Comm: test_progs Tainted: G OE 5.19.0-rc8-02055-g43fe6c051c85 #257 > [ 127.536658] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 > [ 127.547462] Call Trace: > [ 127.549977] <TASK> > [ 127.552175] dump_stack_lvl+0x44/0x5b > [ 127.555758] __lock_acquire.cold.74+0x151/0x2aa > [ 127.560217] lock_acquire+0xc9/0x2f0 > [ 127.563686] ? __bpf_selem_unlink_storage+0x3a/0x150 > [ 127.568524] ? find_held_lock+0x2d/0xa0 > [ 127.572378] _raw_spin_lock_irqsave+0x38/0x60 > [ 127.576532] ? __bpf_selem_unlink_storage+0x3a/0x150 > [ 127.581380] __bpf_selem_unlink_storage+0x3a/0x150 > [ 127.586044] bpf_task_storage_delete+0x53/0xb0 > [ 127.590385] bpf_prog_730e33528dbd2937_on_lookup+0x26/0x3d > [ 127.595673] bpf_trampoline_6442505865_0+0x47/0x1000 > [ 127.600533] ? bpf_local_storage_update+0x250/0x490 > [ 127.605253] bpf_local_storage_lookup+0x5/0x130 > [ 127.609650] bpf_local_storage_update+0xf1/0x490 > [ 127.614175] bpf_sk_storage_get+0xd3/0x130 > [ 127.618126] bpf_prog_b4aaeb10c7178354_socket_bind+0x18e/0x297 > [ 127.623815] bpf_trampoline_6442474456_1+0x5c/0x1000 > [ 127.628591] bpf_lsm_socket_bind+0x5/0x10 > [ 127.632476] security_socket_bind+0x30/0x50 > [ 127.636755] __sys_bind+0xba/0xf0 > [ 127.640113] ? ktime_get_coarse_real_ts64+0xb9/0xc0 > [ 127.644910] ? lockdep_hardirqs_on+0x79/0x100 > [ 127.649438] ? ktime_get_coarse_real_ts64+0xb9/0xc0 > [ 127.654215] ? syscall_trace_enter.isra.16+0x157/0x200 > [ 127.659255] __x64_sys_bind+0x16/0x20 > [ 127.662894] do_syscall_64+0x3a/0x90 > [ 127.666456] entry_SYSCALL_64_after_hwframe+0x63/0xcd > [ 127.671500] RIP: 0033:0x7fbba4b36ceb > [ 127.674982] Code: c3 48 8b 15 77 31 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c2 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 8 > [ 127.692457] RSP: 002b:00007fff4e9c9db8 EFLAGS: 00000206 ORIG_RAX: 0000000000000031 > [ 127.699666] RAX: ffffffffffffffda RBX: 00007fbba5057000 RCX: 00007fbba4b36ceb > [ 127.706448] RDX: 000000000000001c RSI: 00007fff4e9c9e40 RDI: 0000000000000035 > [ 127.713247] RBP: 00007fff4e9c9e00 R08: 0000000000000010 R09: 0000000000000000 > [ 127.719938] R10: 0000000000000000 R11: 0000000000000206 R12: 000000000040d3a0 > [ 127.726790] R13: 00007fff4e9ce330 R14: 0000000000000000 R15: 0000000000000000 > [ 127.733820] </TASK> > > I am not entirely sure I am reading the call trace correctly (or whether it > really is all that accurate for that matter), but one way I could see a > recursive acquisition is if we first acquire the local_storage lock in > bpf_local_storage_update [0], then we call into bpf_local_storage_lookup in line > 439 (with the lock still held), and then attempt to acquire it again in line > 268. bpf_local_storage_lookup(..., cacheit_lockit = false) is called, so it won't acquire the lock again at line 268. Also, from the stack, it is two different locks. One is the sk storage lock and another is the task storage lock, so no dead lock. It is probably triggered when the "on_lookup" test (using the task storage) in task_ls_recursion.c is running in parallel with other sk/inode storage tests. Cc Song if he has insight on how to annotate lockdep in this case. > > The config I used is tools/testing/selftests/bpf/config + > tools/testing/selftests/bpf/config.x86_64. I am at synced to 71930846b36 ("net: > marvell: prestera: uninitialized variable bug"). > > Thanks, > Daniel > > [0] https://elixir.bootlin.com/linux/v5.19-rc8/source/kernel/bpf/bpf_local_storage.c#L426