> On Apr 24, 2023, at 10:45 PM, Yonghong Song <yhs@xxxxxxxx> wrote: > > > > On 4/18/23 8:31 AM, Aditi Ghag wrote: >> Previously, BPF TCP iterator was acquiring fast version of sock lock that >> disables the BH. This introduced a circular dependency with code paths that >> later acquire sockets hash table bucket lock. >> Replace the fast version of sock lock with slow that faciliates BPF >> programs executed from the iterator to destroy TCP listening sockets >> using the bpf_sock_destroy kfunc (implemened in follow-up commits). >> Here is a stack trace that motivated this change: >> ``` >> 1) sock_lock with BH disabled + bucket lock >> lock_acquire+0xcd/0x330 >> _raw_spin_lock_bh+0x38/0x50 >> inet_unhash+0x96/0xd0 >> tcp_set_state+0x6a/0x210 >> tcp_abort+0x12b/0x230 >> bpf_prog_f4110fb1100e26b5_iter_tcp6_server+0xa3/0xaa >> bpf_iter_run_prog+0x1ff/0x340 >> bpf_iter_tcp_seq_show+0xca/0x190 >> bpf_seq_read+0x177/0x450 >> vfs_read+0xc6/0x300 >> ksys_read+0x69/0xf0 >> do_syscall_64+0x3c/0x90 >> entry_SYSCALL_64_after_hwframe+0x72/0xdc > > IIUC, the above deadlock is due to > > > lock_acquire+0xcd/0x330 > > _raw_spin_lock_bh+0x38/0x50 > > inet_unhash+0x96/0xd0 > > tcp_set_state+0x6a/0x210 > > tcp_abort+0x12b/0x230 > > bpf_prog_f4110fb1100e26b5_iter_tcp6_server+0xa3/0xaa > > bpf_iter_run_prog+0x1ff/0x340 > > ... lock_acquire for sock lock ... > > bpf_iter_tcp_seq_show+0xca/0x190 > > bpf_seq_read+0x177/0x450 > > vfs_read+0xc6/0x300 > > ksys_read+0x69/0xf0 > > do_syscall_64+0x3c/0x90 > > entry_SYSCALL_64_after_hwframe+0x72/0xdc > > I could be great to make it explicit with the stack trace so > it is clear where the circular dependency is. > >> 2) sock lock with BH enable >> [ 1.499968] lock_acquire+0xcd/0x330 >> [ 1.500316] _raw_spin_lock+0x33/0x40 >> [ 1.500670] sk_clone_lock+0x146/0x520 >> [ 1.501030] inet_csk_clone_lock+0x1b/0x110 >> [ 1.501433] tcp_create_openreq_child+0x22/0x3f0 >> [ 1.501873] tcp_v6_syn_recv_sock+0x96/0x940 >> [ 1.502284] tcp_check_req+0x137/0x660 >> [ 1.502646] tcp_v6_rcv+0xa63/0xe80 >> [ 1.502994] ip6_protocol_deliver_rcu+0x78/0x590 >> [ 1.503434] ip6_input_finish+0x72/0x140 >> [ 1.503818] __netif_receive_skb_one_core+0x63/0xa0 >> [ 1.504281] process_backlog+0x79/0x260 >> [ 1.504668] __napi_poll.constprop.0+0x27/0x170 >> [ 1.505104] net_rx_action+0x14a/0x2a0 >> [ 1.505469] __do_softirq+0x165/0x510 >> [ 1.505842] do_softirq+0xcd/0x100 >> [ 1.506172] __local_bh_enable_ip+0xcc/0xf0 >> [ 1.506588] ip6_finish_output2+0x2a8/0xb00 >> [ 1.506988] ip6_finish_output+0x274/0x510 >> [ 1.507377] ip6_xmit+0x319/0x9b0 >> [ 1.507726] inet6_csk_xmit+0x12b/0x2b0 >> [ 1.508096] __tcp_transmit_skb+0x549/0xc40 >> [ 1.508498] tcp_rcv_state_process+0x362/0x1180 > > Similarly, it would be good to illustrate where is the > deadlock in this case. Ack: responded to Paolo's message directly with more details. Thanks! > >> ``` >> Acked-by: Stanislav Fomichev <sdf@xxxxxxxxxx> >> Signed-off-by: Aditi Ghag <aditi.ghag@xxxxxxxxxxxxx> >> --- >> net/ipv4/tcp_ipv4.c | 5 ++--- >> 1 file changed, 2 insertions(+), 3 deletions(-) >> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c >> index ea370afa70ed..f2d370a9450f 100644 >> --- a/net/ipv4/tcp_ipv4.c >> +++ b/net/ipv4/tcp_ipv4.c >> @@ -2962,7 +2962,6 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v) >> struct bpf_iter_meta meta; >> struct bpf_prog *prog; >> struct sock *sk = v; >> - bool slow; >> uid_t uid; >> int ret; >> @@ -2970,7 +2969,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v) >> return 0; >> if (sk_fullsock(sk)) >> - slow = lock_sock_fast(sk); >> + lock_sock(sk); >> if (unlikely(sk_unhashed(sk))) { >> ret = SEQ_SKIP; >> @@ -2994,7 +2993,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v) >> unlock: >> if (sk_fullsock(sk)) >> - unlock_sock_fast(sk, slow); >> + release_sock(sk); >> return ret; >> }