On 1/8/20 7:01 PM, John Fastabend wrote:
Daniel Borkmann wrote:
On Wed, Jan 08, 2020 at 12:57:08PM +0800, Lingpeng Chen wrote:
Right now in tcp_bpf_recvmsg, sock read data first from sk_receive_queue
if not empty than psock->ingress_msg otherwise. If a FIN packet arrives
and there's also some data in psock->ingress_msg, the data in
psock->ingress_msg will be purged. It is always happen when request to a
HTTP1.0 server like python SimpleHTTPServer since the server send FIN
packet after data is sent out.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Reported-by: Arika Chen <eaglesora@xxxxxxxxx>
Suggested-by: Arika Chen <eaglesora@xxxxxxxxx>
Signed-off-by: Lingpeng Chen <forrest0579@xxxxxxxxx>
Signed-off-by: John Fastabend <john.fastabend@xxxxxxxxx>
---
net/ipv4/tcp_bpf.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index e38705165ac9..f7e902868fce 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -123,12 +123,13 @@ int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
if (unlikely(flags & MSG_ERRQUEUE))
return inet_recv_error(sk, msg, len, addr_len);
Shouldn't we also move the error queue handling below the psock test as
well and let tcp_recvmsg() natively do it in case of !psock?
You mean the MSG_ERRQUEUE flag handling? If the user sets MSG_ERRQUEUE
they expect to receive any queued errors it would be wrong to return
psock data in this case if psock is attached and has data on queue and
user passes MSG_ERRQUEUE flag.
MSG_ERRQUEUE (since Linux 2.2)
This flag specifies that queued errors should be received from the socket
error queue. The error is passed in an ancillary message with a type
dependent on the protocol (for IPv4 IP_RECVERR). The user should supply
a buffer of sufficient size. See cmsg(3) and ip(7) for more information.
The payload of the original packet that caused the error is passed as
normal data via msg_iovec. The original destination address of the
datagram that caused the error is supplied via msg_name.
I believe it needs to be where it is.
I meant that it should have looked as follows (aka moving both below the
psock test) ...
psock = sk_psock_get(sk);
if (unlikely(!psock))
return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
if (unlikely(flags & MSG_ERRQUEUE))
return inet_recv_error(sk, msg, len, addr_len);
if (!skb_queue_empty(&sk->sk_receive_queue) && [...]
... since when detached it's handled already via tcp_recvmsg() internals.
- if (!skb_queue_empty(&sk->sk_receive_queue))
- return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
psock = sk_psock_get(sk);
if (unlikely(!psock))
return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
+ if (!skb_queue_empty(&sk->sk_receive_queue) &&
+ sk_psock_queue_empty(psock))
+ return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
lock_sock(sk);
msg_bytes_ready:
copied = __tcp_bpf_recvmsg(sk, psock, msg, len, flags);
@@ -139,7 +140,7 @@ int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
timeo = sock_rcvtimeo(sk, nonblock);
data = tcp_bpf_wait_data(sk, psock, flags, timeo, &err);
if (data) {
- if (skb_queue_empty(&sk->sk_receive_queue))
+ if (!sk_psock_queue_empty(psock))
goto msg_bytes_ready;
release_sock(sk);
sk_psock_put(sk, psock);
--
2.17.1