Re: [PATCH v8 bpf-next 01/10] bpf: tcp: Avoid taking fast sock lock in iterator

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 5/17/23 10:54 AM, Aditi Ghag wrote:
This is a preparatory commit to replace `lock_sock_fast` with
`lock_sock`, and faciliate BPF programs executed from the iterator to be

facilitate

able to destroy TCP listening sockets using the bpf_sock_destroy kfunc
(implemened in follow-up commits).  Previously, BPF TCP iterator was

implemented

acquiring the sock lock with BH disabled. This led to scenarios where
the sockets hash table bucket lock can be acquired with BH enabled in
some context versus disabled in other, and  caused a
<softirq-safe> -> <softirq-unsafe> dependency with the sock lock.

For 'and caused a <softirq-safe> -> <softirq-unsafe> dependency with
the sock lock', maybe can be rephrased like below:

In such situation, kernel issued an warning since it thinks that
in the BH enabled path the same bucket lock *might* be acquired again
in the softirq context (BH disabled), which will lead to a potential
dead lock.

Note that in this particular triggering, the local_bh_disable()
happens in process context, so the warning is a false alarm.


Here is a snippet of annotated stack trace that motivated this change:

```

Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&h->lhash2[i].lock);
                               local_irq_disable();
                               lock(slock-AF_INET6);
                               lock(&h->lhash2[i].lock);

                                 local_bh_disable();
                                 lock(&h->lhash2[i].lock);

  <Interrupt>
    lock(slock-AF_INET6);
*** DEADLOCK ***
Replace the above with below:

kernel imagined possible scenario:
    local_bh_disable();  /* Possible softirq */
    lock(&h->lhash2[i].lock);
*** Potential Deadlock ***


process context:

lock_acquire+0xcd/0x330
_raw_spin_lock+0x33/0x40
------> Acquire (bucket) lhash2.lock with BH enabled
__inet_hash+0x4b/0x210
inet_csk_listen_start+0xe6/0x100
inet_listen+0x95/0x1d0
__sys_listen+0x69/0xb0
__x64_sys_listen+0x14/0x20
do_syscall_64+0x3c/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc


bpf_sock_destroy run from iterator in interrupt context:

lock_acquire+0xcd/0x330
_raw_spin_lock+0x33/0x40
------> Acquire (bucket) lhash2.lock with BH disabled
inet_unhash+0x9a/0x110
tcp_set_state+0x6a/0x210
tcp_abort+0x10d/0x200
bpf_prog_6793c5ca50c43c0d_iter_tcp6_server+0xa4/0xa9
bpf_iter_run_prog+0x1ff/0x340
------> lock_sock_fast that acquires sock lock with BH disabled
bpf_iter_tcp_seq_show+0xca/0x190
bpf_seq_read+0x177/0x450

```

Acked-by: Yonghong Song <yhs@xxxxxxxx>
Acked-by: Stanislav Fomichev <sdf@xxxxxxxxxx>
Signed-off-by: Aditi Ghag <aditi.ghag@xxxxxxxxxxxxx>
---
  net/ipv4/tcp_ipv4.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index ea370afa70ed..f2d370a9450f 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2962,7 +2962,6 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v)
  	struct bpf_iter_meta meta;
  	struct bpf_prog *prog;
  	struct sock *sk = v;
-	bool slow;
  	uid_t uid;
  	int ret;
@@ -2970,7 +2969,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v)
  		return 0;
if (sk_fullsock(sk))
-		slow = lock_sock_fast(sk);
+		lock_sock(sk);
if (unlikely(sk_unhashed(sk))) {
  		ret = SEQ_SKIP;
@@ -2994,7 +2993,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v)
unlock:
  	if (sk_fullsock(sk))
-		unlock_sock_fast(sk, slow);
+		release_sock(sk);
  	return ret;
}




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux