On Fri, Jun 07, 2024 at 02:09:59PM +0200, Vincent Whitchurch wrote: > On Thu, Jun 6, 2024 at 2:47 PM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote: > > On Thu, Jun 6, 2024 at 6:00 PM Vincent Whitchurch > > <vincent.whitchurch@xxxxxxxxxxxxx> wrote: > > > With a socket in the sockmap, if there's a parser callback installed > > > and the verdict callback returns SK_PASS, the kernel deadlocks > > > immediately after the verdict callback is run. This started at commit > > > 6648e613226e18897231ab5e42ffc29e63fa3365 ("bpf, skmsg: Fix NULL > > > pointer dereference in sk_psock_skb_ingress_enqueue"). > > > > > > It can be reproduced by running ./test_sockmap -t ping > > > --txmsg_pass_skb. The --txmsg_pass_skb command to test_sockmap is > > > available in this series: > > > https://lore.kernel.org/netdev/20240606-sockmap-splice-v1-0-4820a2ab14b5@xxxxxxxxxxxxx/. > > > > I don't have time right now to look into this issue carefully until > > this weekend. BTW, did you mean the patch [2/5] in the link that can > > solve the problem? > > No. That patch set addresses a different problem which occurs even if > only a verdict callback is used. But patch 4/5 in that patch set adds > the --txmsg_pass_skb option to the test_sockmap test program, and that > option can be used to reproduce this deadlock too. I think we can remove that write_lock_bh(&sk->sk_callback_lock). Can you test the following patch? ------------> diff --git a/net/core/skmsg.c b/net/core/skmsg.c index fd20aae30be2..da64ded97f3a 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -1116,9 +1116,7 @@ static void sk_psock_strp_data_ready(struct sock *sk) if (tls_sw_has_ctx_rx(sk)) { psock->saved_data_ready(sk); } else { - write_lock_bh(&sk->sk_callback_lock); strp_data_ready(&psock->strp); - write_unlock_bh(&sk->sk_callback_lock); } } rcu_read_unlock();