> On Apr 20, 2023, at 1:55 AM, Paolo Abeni <pabeni@xxxxxxxxxx> wrote: > > Hi, > > On Tue, 2023-04-18 at 15:31 +0000, Aditi Ghag wrote: >> Previously, BPF TCP iterator was acquiring fast version of sock lock that >> disables the BH. This introduced a circular dependency with code paths that >> later acquire sockets hash table bucket lock. >> Replace the fast version of sock lock with slow that faciliates BPF >> programs executed from the iterator to destroy TCP listening sockets >> using the bpf_sock_destroy kfunc (implemened in follow-up commits). >> >> Here is a stack trace that motivated this change: >> >> ``` >> 1) sock_lock with BH disabled + bucket lock >> >> lock_acquire+0xcd/0x330 >> _raw_spin_lock_bh+0x38/0x50 >> inet_unhash+0x96/0xd0 >> tcp_set_state+0x6a/0x210 >> tcp_abort+0x12b/0x230 >> bpf_prog_f4110fb1100e26b5_iter_tcp6_server+0xa3/0xaa >> bpf_iter_run_prog+0x1ff/0x340 >> bpf_iter_tcp_seq_show+0xca/0x190 >> bpf_seq_read+0x177/0x450 >> vfs_read+0xc6/0x300 >> ksys_read+0x69/0xf0 >> do_syscall_64+0x3c/0x90 >> entry_SYSCALL_64_after_hwframe+0x72/0xdc >> >> 2) sock lock with BH enable >> >> [ 1.499968] lock_acquire+0xcd/0x330 >> [ 1.500316] _raw_spin_lock+0x33/0x40 > > The above is quite confusing to me, here BH are disabled as well > (otherwise the whole softirq processing would be really broken) Hi! Looks like the commit warrants more detailed explanation. I've annotated the relevant stack traces with the locks in question (also updated the patch description locally): lock_acquire+0xcd/0x330 _raw_spin_lock+0x33/0x40 ------> sock lock acquired with BH enabled sk_clone_lock+0x146/0x520 inet_csk_clone_lock+0x1b/0x110 tcp_create_openreq_child+0x22/0x3f0 tcp_v6_syn_recv_sock+0x96/0x940 lock_acquire+0xcd/0x330 _raw_spin_lock+0x33/0x40 ------> Acquire (bucket) lhash2.lock (may cause deadlock if interrupted) __inet_hash+0x4b/0x210 inet_csk_listen_start+0xe6/0x100 inet_listen+0x95/0x1d0 __sys_listen+0x69/0xb0 __x64_sys_listen+0x14/0x20 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc lock_acquire+0xcd/0x330 _raw_spin_lock+0x33/0x40 ------> Acquire (bucket) lhash2.lock inet_unhash+0x9a/0x110 tcp_set_state+0x6a/0x210 tcp_abort+0x10d/0x200 bpf_prog_6793c5ca50c43c0d_iter_tcp6_server+0xa4/0xa9 bpf_iter_run_prog+0x1ff/0x340 ------> Release (bucket) lhash2.lock bpf_iter_tcp_seq_show+0xca/0x190 ------> Acquire (bucket) lhash2.lock ------> sock lock acquired with BH disabled bpf_seq_read+0x177/0x450 --------------------------------------------------------------------------------- Here is the full stack trace: [ 1.543344] and this task is already holding: [ 1.543786] ffffa37903b400b0 (slock-AF_INET6){+.-.}-{2:2}, at: __lock_sock_fast+0x33/0x70 [ 1.544410] which would create a new lock dependency: [ 1.544797] (slock-AF_INET6){+.-.}-{2:2} -> (&h->lhash2[i].lock){+.+.}-{2:2} [ 1.545361] [ 1.545361] but this new dependency connects a SOFTIRQ-irq-safe lock: [ 1.545961] (slock-AF_INET6){+.-.}-{2:2} [ 1.545963] [ 1.545963] ... which became SOFTIRQ-irq-safe at: [ 1.546745] lock_acquire+0xcd/0x330 [ 1.547033] _raw_spin_lock+0x33/0x40 [ 1.547325] sk_clone_lock+0x146/0x520 [ 1.547623] inet_csk_clone_lock+0x1b/0x110 [ 1.547960] tcp_create_openreq_child+0x22/0x3f0 [ 1.548327] tcp_v6_syn_recv_sock+0x96/0x940 [ 1.548672] tcp_check_req+0x13f/0x640 [ 1.548977] tcp_v6_rcv+0xa62/0xe80 [ 1.549258] ip6_protocol_deliver_rcu+0x78/0x590 [ 1.549621] ip6_input_finish+0x72/0x140 [ 1.549931] __netif_receive_skb_one_core+0x63/0xa0 [ 1.550313] process_backlog+0x79/0x260 [ 1.550619] __napi_poll.constprop.0+0x27/0x170 [ 1.550976] net_rx_action+0x14a/0x2a0 [ 1.551272] __do_softirq+0x165/0x510 [ 1.551563] do_softirq+0xcd/0x100 [ 1.551836] __local_bh_enable_ip+0xcc/0xf0 [ 1.552168] ip6_finish_output2+0x27c/0xb10 [ 1.552500] ip6_finish_output+0x274/0x510 [ 1.552823] ip6_xmit+0x319/0x9b0 [ 1.553095] inet6_csk_xmit+0x12b/0x2b0 [ 1.553398] __tcp_transmit_skb+0x543/0xc30 [ 1.553731] tcp_rcv_state_process+0x362/0x1180 [ 1.554088] tcp_v6_do_rcv+0x10f/0x540 [ 1.554387] __release_sock+0x6a/0xe0 [ 1.554679] release_sock+0x2f/0xb0 [ 1.554957] __inet_stream_connect+0x1ac/0x3a0 [ 1.555308] inet_stream_connect+0x3b/0x60 [ 1.555632] __sys_connect+0xa3/0xd0 [ 1.555915] __x64_sys_connect+0x18/0x20 [ 1.556222] do_syscall_64+0x3c/0x90 [ 1.556510] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 1.556909] [ 1.556909] to a SOFTIRQ-irq-unsafe lock: [ 1.557326] (&h->lhash2[i].lock){+.+.}-{2:2} [ 1.557329] [ 1.557329] ... which became SOFTIRQ-irq-unsafe at: [ 1.558148] ... [ 1.558149] lock_acquire+0xcd/0x330 [ 1.558579] _raw_spin_lock+0x33/0x40 [ 1.558874] __inet_hash+0x4b/0x210 [ 1.559154] inet_csk_listen_start+0xe6/0x100 [ 1.559503] inet_listen+0x95/0x1d0 [ 1.559782] __sys_listen+0x69/0xb0 [ 1.560063] __x64_sys_listen+0x14/0x20 [ 1.560365] do_syscall_64+0x3c/0x90 [ 1.560652] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 1.561052] [ 1.561052] other info that might help us debug this: [ 1.561052] [ 1.561658] Possible interrupt unsafe locking scenario: [ 1.561658] [ 1.562171] CPU0 CPU1 [ 1.562521] ---- ---- [ 1.562870] lock(&h->lhash2[i].lock); [ 1.563167] local_irq_disable(); [ 1.563618] lock(slock-AF_INET6); [ 1.564076] lock(&h->lhash2[i].lock); [ 1.564558] <Interrupt> [ 1.564763] lock(slock-AF_INET6); [ 1.565053] [ 1.565053] *** DEADLOCK *** [ 1.565053] [ 1.565505] 4 locks held by test_progs/117: [ 1.565829] #0: ffffa37903af60a0 (&p->lock){+.+.}-{3:3}, at: bpf_seq_read+0x3f/0x450 [ 1.566428] #1: ffffa37903b40130 (sk_lock-AF_INET6){+.+.}-{0:0}, at: bpf_seq_read+0x177/0x450 [ 1.567091] #2: ffffa37903b400b0 (slock-AF_INET6){+.-.}-{2:2}, at: __lock_sock_fast+0x33/0x70 [ 1.567780] #3: ffffffffb4f966c0 (rcu_read_lock){....}-{1:2}, at: bpf_iter_run_prog+0x18b/0x340 [ 1.568453] [ 1.568453] the dependencies between SOFTIRQ-irq-safe lock and the holding lock: [ 1.569132] -> (slock-AF_INET6){+.-.}-{2:2} { [ 1.569469] HARDIRQ-ON-W at: [ 1.569717] lock_acquire+0xcd/0x330 [ 1.570117] _raw_spin_lock_bh+0x38/0x50 [ 1.570543] lock_sock_nested+0x50/0x80 [ 1.570958] sk_setsockopt+0x776/0x16b0 [ 1.571376] __sys_setsockopt+0x193/0x1b0 [ 1.571809] __x64_sys_setsockopt+0x1f/0x30 [ 1.572259] do_syscall_64+0x3c/0x90 [ 1.572671] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 1.573185] IN-SOFTIRQ-W at: [ 1.573433] lock_acquire+0xcd/0x330 [ 1.573835] _raw_spin_lock+0x33/0x40 [ 1.574242] sk_clone_lock+0x146/0x520 [ 1.574657] inet_csk_clone_lock+0x1b/0x110 [ 1.575106] tcp_create_openreq_child+0x22/0x3f0 [ 1.575585] tcp_v6_syn_recv_sock+0x96/0x940 [ 1.576042] tcp_check_req+0x13f/0x640 [ 1.576456] tcp_v6_rcv+0xa62/0xe80 [ 1.576853] ip6_protocol_deliver_rcu+0x78/0x590 [ 1.577337] ip6_input_finish+0x72/0x140 [ 1.577772] __netif_receive_skb_one_core+0x63/0xa0 [ 1.578272] process_backlog+0x79/0x260 [ 1.578696] __napi_poll.constprop.0+0x27/0x170 [ 1.579169] net_rx_action+0x14a/0x2a0 [ 1.579586] __do_softirq+0x165/0x510 [ 1.579995] do_softirq+0xcd/0x100 [ 1.580381] __local_bh_enable_ip+0xcc/0xf0 [ 1.580831] ip6_finish_output2+0x27c/0xb10 [ 1.581289] ip6_finish_output+0x274/0x510 [ 1.581733] ip6_xmit+0x319/0x9b0 [ 1.582118] inet6_csk_xmit+0x12b/0x2b0 [ 1.582540] __tcp_transmit_skb+0x543/0xc30 [ 1.582988] tcp_rcv_state_process+0x362/0x1180 [ 1.583464] tcp_v6_do_rcv+0x10f/0x540 [ 1.583882] __release_sock+0x6a/0xe0 [ 1.584291] release_sock+0x2f/0xb0 [ 1.584686] __inet_stream_connect+0x1ac/0x3a0 [ 1.585157] inet_stream_connect+0x3b/0x60 [ 1.585598] __sys_connect+0xa3/0xd0 [ 1.586001] __x64_sys_connect+0x18/0x20 [ 1.586429] do_syscall_64+0x3c/0x90 [ 1.586832] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 1.587346] INITIAL USE at: [ 1.587590] lock_acquire+0xcd/0x330 [ 1.587985] _raw_spin_lock_bh+0x38/0x50 [ 1.588407] lock_sock_nested+0x50/0x80 [ 1.588821] sk_setsockopt+0x776/0x16b0 [ 1.589241] __sys_setsockopt+0x193/0x1b0 [ 1.589666] __x64_sys_setsockopt+0x1f/0x30 [ 1.590108] do_syscall_64+0x3c/0x90 [ 1.590505] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 1.591007] } [ 1.591144] ... key at: [<ffffffffb6441bc0>] af_family_slock_keys+0xa0/0x2e0 [ 1.591721] [ 1.591721] the dependencies between the lock to be acquired [ 1.591722] and SOFTIRQ-irq-unsafe lock: [ 1.592596] -> (&h->lhash2[i].lock){+.+.}-{2:2} { [ 1.592968] HARDIRQ-ON-W at: [ 1.593209] lock_acquire+0xcd/0x330 [ 1.593604] _raw_spin_lock+0x33/0x40 [ 1.594002] __inet_hash+0x4b/0x210 [ 1.594387] inet_csk_listen_start+0xe6/0x100 [ 1.594839] inet_listen+0x95/0x1d0 [ 1.595217] __sys_listen+0x69/0xb0 [ 1.595594] __x64_sys_listen+0x14/0x20 [ 1.595997] do_syscall_64+0x3c/0x90 [ 1.596386] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 1.596891] SOFTIRQ-ON-W at: [ 1.597133] lock_acquire+0xcd/0x330 [ 1.597537] _raw_spin_lock+0x33/0x40 [ 1.597930] __inet_hash+0x4b/0x210 [ 1.598308] inet_csk_listen_start+0xe6/0x100 [ 1.598750] inet_listen+0x95/0x1d0 [ 1.599129] __sys_listen+0x69/0xb0 [ 1.599509] __x64_sys_listen+0x14/0x20 [ 1.599912] do_syscall_64+0x3c/0x90 [ 1.600303] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 1.600799] INITIAL USE at: [ 1.601037] lock_acquire+0xcd/0x330 [ 1.601416] _raw_spin_lock+0x33/0x40 [ 1.601801] __inet_hash+0x4b/0x210 [ 1.602178] inet_csk_listen_start+0xe6/0x100 [ 1.602618] inet_listen+0x95/0x1d0 [ 1.602995] __sys_listen+0x69/0xb0 [ 1.603368] __x64_sys_listen+0x14/0x20 [ 1.603766] do_syscall_64+0x3c/0x90 [ 1.604150] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 1.604647] } [ 1.604780] ... key at: [<ffffffffb64567a0>] __key.1+0x0/0x10 [ 1.605235] ... acquired at: [ 1.605470] lock_acquire+0xcd/0x330 [ 1.605748] _raw_spin_lock+0x33/0x40 [ 1.606017] inet_unhash+0x9a/0x110 [ 1.606275] tcp_set_state+0x6a/0x210 [ 1.606552] tcp_abort+0x10d/0x200 [ 1.606807] bpf_prog_6793c5ca50c43c0d_iter_tcp6_server+0xa4/0xa9 [ 1.607243] bpf_iter_run_prog+0x1ff/0x340 [ 1.607553] bpf_iter_tcp_seq_show+0xca/0x190 [ 1.607874] bpf_seq_read+0x177/0x450 [ 1.608145] vfs_read+0xc6/0x300 [ 1.608389] ksys_read+0x69/0xf0 [ 1.608631] do_syscall_64+0x3c/0x90 [ 1.608903] entry_SYSCALL_64_after_hwframe+0x72/0xdc > > Thanks, > > Paolo