On Fri, May 29, 2020 at 11:05 AM CEST, dihu wrote: > On 2020/5/27 5:10, John Fastabend wrote: >> dihu wrote: >>> From 865a45747de6b68fd02a0ff128a69a5c8feb73c3 Mon Sep 17 00:00:00 2001 >>> From: dihu <anny.hu@xxxxxxxxxxxxxxxxx> >>> Date: Mon, 25 May 2020 17:23:16 +0800 >>> Subject: [PATCH] bpf/sockmap: fix kernel panic at __tcp_bpf_recvmsg >>> >>> When user application calls read() with MSG_PEEK flag to read data >>> of bpf sockmap socket, kernel panic happens at >>> __tcp_bpf_recvmsg+0x12c/0x350. sk_msg is not removed from ingress_msg >>> queue after read out under MSG_PEEK flag is set. Because it's not >>> judged whether sk_msg is the last msg of ingress_msg queue, the next >>> sk_msg may be the head of ingress_msg queue, whose memory address of >>> sg page is invalid. So it's necessary to add check codes to prevent >>> this problem. >>> >>> [20759.125457] BUG: kernel NULL pointer dereference, address: >>> 0000000000000008 >>> [20759.132118] CPU: 53 PID: 51378 Comm: envoy Tainted: G E >>> 5.4.32 #1 >>> [20759.140890] Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS >>> 4.1.12 06/18/2017 >>> [20759.149734] RIP: 0010:copy_page_to_iter+0xad/0x300 >>> [20759.270877] __tcp_bpf_recvmsg+0x12c/0x350 >>> [20759.276099] tcp_bpf_recvmsg+0x113/0x370 >>> [20759.281137] inet_recvmsg+0x55/0xc0 >>> [20759.285734] __sys_recvfrom+0xc8/0x130 >>> [20759.290566] ? __audit_syscall_entry+0x103/0x130 >>> [20759.296227] ? syscall_trace_enter+0x1d2/0x2d0 >>> [20759.301700] ? __audit_syscall_exit+0x1e4/0x290 >>> [20759.307235] __x64_sys_recvfrom+0x24/0x30 >>> [20759.312226] do_syscall_64+0x55/0x1b0 >>> [20759.316852] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> >>> Signed-off-by: dihu <anny.hu@xxxxxxxxxxxxxxxxx> >>> --- >>> net/ipv4/tcp_bpf.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c >>> index 5a05327..c0d4624 100644 >>> --- a/net/ipv4/tcp_bpf.c >>> +++ b/net/ipv4/tcp_bpf.c >>> @@ -64,6 +64,9 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, >>> } while (i != msg_rx->sg.end); >>> >>> if (unlikely(peek)) { >>> + if (msg_rx == list_last_entry(&psock->ingress_msg, >>> + struct sk_msg, list)) >>> + break; >> >> Thanks. Change looks good but spacing is a bit off . Can we >> turn those spaces into tabs? Otherwise adding fixes tag and >> my ack would be great. >> >> Fixes: 02c558b2d5d67 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress") >> Acked-by: John Fastabend <john.fastabend@xxxxxxxxx> > > > From 127a334fa5e5d029353ceb1a0414886c527f4be5 Mon Sep 17 00:00:00 2001 > From: dihu <anny.hu@xxxxxxxxxxxxxxxxx> > Date: Fri, 29 May 2020 16:38:50 +0800 > Subject: [PATCH] bpf/sockmap: fix kernel panic at __tcp_bpf_recvmsg > > When user application calls read() with MSG_PEEK flag to read data > of bpf sockmap socket, kernel panic happens at > __tcp_bpf_recvmsg+0x12c/0x350. sk_msg is not removed from ingress_msg > queue after read out under MSG_PEEK flag is set. Because it's not > judged whether sk_msg is the last msg of ingress_msg queue, the next > sk_msg may be the head of ingress_msg queue, whose memory address of > sg page is invalid. So it's necessary to add check codes to prevent > this problem. > > [20759.125457] BUG: kernel NULL pointer dereference, address: > 0000000000000008 > [20759.132118] CPU: 53 PID: 51378 Comm: envoy Tainted: G E > 5.4.32 #1 > [20759.140890] Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS > 4.1.12 06/18/2017 > [20759.149734] RIP: 0010:copy_page_to_iter+0xad/0x300 > [20759.270877] __tcp_bpf_recvmsg+0x12c/0x350 > [20759.276099] tcp_bpf_recvmsg+0x113/0x370 > [20759.281137] inet_recvmsg+0x55/0xc0 > [20759.285734] __sys_recvfrom+0xc8/0x130 > [20759.290566] ? __audit_syscall_entry+0x103/0x130 > [20759.296227] ? syscall_trace_enter+0x1d2/0x2d0 > [20759.301700] ? __audit_syscall_exit+0x1e4/0x290 > [20759.307235] __x64_sys_recvfrom+0x24/0x30 > [20759.312226] do_syscall_64+0x55/0x1b0 > [20759.316852] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > Signed-off-by: dihu <anny.hu@xxxxxxxxxxxxxxxxx> > --- > net/ipv4/tcp_bpf.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c > index 5a05327..b82e4c3 100644 > --- a/net/ipv4/tcp_bpf.c > +++ b/net/ipv4/tcp_bpf.c > @@ -64,6 +64,9 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, > } while (i != msg_rx->sg.end); > > if (unlikely(peek)) { > + if (msg_rx == list_last_entry(&psock->ingress_msg, > + struct sk_msg, list)) > + break; > msg_rx = list_next_entry(msg_rx, list); > continue; > } Looks like the patch is garbled. I suspect due to copy-paste to an e-mail client. Context line got wrapped and there are non-breaking spaces instead of tabs in the body. Crash fix is important so could you resend it with `git send-email`? git send-email --to bpf@xxxxxxxxxxxxxxx --cc netdev@xxxxxxxxxxxxxxx file.patch You might find the documentation below helpful: https://www.kernel.org/doc/html/latest/process/email-clients.html