Cong Wang wrote: > On Fri, Jun 07, 2024 at 02:09:59PM +0200, Vincent Whitchurch wrote: > > On Thu, Jun 6, 2024 at 2:47 PM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote: > > > On Thu, Jun 6, 2024 at 6:00 PM Vincent Whitchurch > > > <vincent.whitchurch@xxxxxxxxxxxxx> wrote: > > > > With a socket in the sockmap, if there's a parser callback installed > > > > and the verdict callback returns SK_PASS, the kernel deadlocks > > > > immediately after the verdict callback is run. This started at commit > > > > 6648e613226e18897231ab5e42ffc29e63fa3365 ("bpf, skmsg: Fix NULL > > > > pointer dereference in sk_psock_skb_ingress_enqueue"). > > > > > > > > It can be reproduced by running ./test_sockmap -t ping > > > > --txmsg_pass_skb. The --txmsg_pass_skb command to test_sockmap is > > > > available in this series: > > > > https://lore.kernel.org/netdev/20240606-sockmap-splice-v1-0-4820a2ab14b5@xxxxxxxxxxxxx/. > > > > > > I don't have time right now to look into this issue carefully until > > > this weekend. BTW, did you mean the patch [2/5] in the link that can > > > solve the problem? > > > > No. That patch set addresses a different problem which occurs even if > > only a verdict callback is used. But patch 4/5 in that patch set adds > > the --txmsg_pass_skb option to the test_sockmap test program, and that > > option can be used to reproduce this deadlock too. > > I think we can remove that write_lock_bh(&sk->sk_callback_lock). Can you > test the following patch? > > ------------> > > diff --git a/net/core/skmsg.c b/net/core/skmsg.c > index fd20aae30be2..da64ded97f3a 100644 > --- a/net/core/skmsg.c > +++ b/net/core/skmsg.c > @@ -1116,9 +1116,7 @@ static void sk_psock_strp_data_ready(struct sock *sk) > if (tls_sw_has_ctx_rx(sk)) { > psock->saved_data_ready(sk); > } else { > - write_lock_bh(&sk->sk_callback_lock); > strp_data_ready(&psock->strp); > - write_unlock_bh(&sk->sk_callback_lock); > } > } > rcu_read_unlock(); Its not obvious to me that we can run the strp parser without the sk_callback lock here. I believe below is the correct fix. It fixes the splat above with test. bpf: sockmap, fix introduced strparser recursive lock Originally there was a race where removing a psock from the sock map while it was also receiving an skb and calling sk_psock_data_ready(). It was possible the removal code would NULL/set the data_ready callback while concurrently calling the hook from receive path. The fix was to wrap the access in sk_callback_lock to ensure the saved_data_ready pointer didn't change under us. There was some discussion around doing a larger change to ensure we could use READ_ONCE/WRITE_ONCE over the callback, but that was for *next kernels not stable fixes. But, we unfortunately introduced a regression with the fix because there is another path into this code (that didn't have a test case) through the stream parser. The stream parser runs with the lower lock which means we get the following splat and lock up. ============================================ WARNING: possible recursive locking detected 6.10.0-rc2 #59 Not tainted -------------------------------------------- test_sockmap/342 is trying to acquire lock: ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at: sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467 net/core/skmsg.c:555) but task is already holding lock: ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at: sk_psock_strp_data_ready (net/core/skmsg.c:1120) To fix ensure we do not grap lock when we reach this code through the strparser. Fixes: 6648e613226e1 ("bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue") Signed-off-by: John Fastabend <john.fastabend@xxxxxxxxx> --- include/linux/skmsg.h | 9 +++++++-- net/core/skmsg.c | 5 ++++- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index c9efda9df285..3659e9b514d0 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -461,13 +461,18 @@ static inline void sk_psock_put(struct sock *sk, struct sk_psock *psock) sk_psock_drop(sk, psock); } -static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock) +static inline void __sk_psock_data_ready(struct sock *sk, struct sk_psock *psock) { - read_lock_bh(&sk->sk_callback_lock); if (psock->saved_data_ready) psock->saved_data_ready(sk); else sk->sk_data_ready(sk); +} + +static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock) +{ + read_lock_bh(&sk->sk_callback_lock); + __sk_psock_data_ready(sk, psock); read_unlock_bh(&sk->sk_callback_lock); } diff --git a/net/core/skmsg.c b/net/core/skmsg.c index fd20aae30be2..8429daecbbb6 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -552,7 +552,10 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb, msg->skb = skb; sk_psock_queue_msg(psock, msg); - sk_psock_data_ready(sk, psock); + if (skb_bpf_strparser(skb)) + __sk_psock_data_ready(sk, psock); + else + sk_psock_data_ready(sk, psock); return copied; }