On 11/8/24 1:03 PM, Martin KaFai Lau wrote:
On 11/6/24 4:44 AM, mrpre wrote:
When the stream_verdict program returns SK_PASS, it places the received skb
into its own receive queue, but a recursive lock eventually occurs, leading
to an operating system deadlock. This issue has been present since v6.9.
'''
sk_psock_strp_data_ready
write_lock_bh(&sk->sk_callback_lock)
strp_data_ready
strp_read_sock
read_sock -> tcp_read_sock
strp_recv
cb.rcv_msg -> sk_psock_strp_read
# now stream_verdict return SK_PASS without peer sock assign
__SK_PASS = sk_psock_map_verd(SK_PASS, NULL)
sk_psock_verdict_apply
sk_psock_skb_ingress_self
sk_psock_skb_ingress_enqueue
sk_psock_data_ready
read_lock_bh(&sk->sk_callback_lock) <= dead lock
'''
This topic has been discussed before, but it has not been fixed.
Previous discussion:
https://lore.kernel.org/all/6684a5864ec86_403d20898@john.notmuch
Is the selftest included in this link still useful to reproduce this bug?
If yes, please include that also.
Fixes: 6648e613226e ("bpf, skmsg: Fix NULL pointer dereference in
sk_psock_skb_ingress_enqueue")
Reported-by: Vincent Whitchurch <vincent.whitchurch@xxxxxxxxxxxxx>
Signed-off-by: Jiayuan Chen <mrpre@xxxxxxx>
Please also use the real name in the author (i.e. the email sender). The patch
needs a real author name also. I had manually fixed one of your earlier
lock_sock fix before applying.
and the bpf mailing list address has a typo in the original patch email... I
fixed that in this reply.
pw-bot: cr
Signed-off-by: John Fastabend <john.fastabend@xxxxxxxxx>
The patch and the earlier discussion make sense to me.
John and JakubS, please help to take another look in the next respin.