Re: [PATCH stable, 6.1] net: sockmap, fix missing MSG_MORE causing TCP disruptions

Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> · Sat, 4 May 2024 09:30:52 +0200

On Fri, May 03, 2024 at 09:48:05AM -0700, John Fastabend wrote:
> [ Upstream commit ebf2e8860eea66e2c4764316b80c6a5ee5f336ee]
> [ Upstream commit f8dd95b29d7ef08c19ec9720564acf72243ddcf6]

Why are you mushing 2 patches together?  Why can't we just take the two
as-is instead?  That makes tracking everything much simpler and
possible.

> In the first patch,
> 
> ebf2e8860eea ("tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around tcp_sendmsg")
> 
> This block of code is added to tcp_bpf_push(). The
> tcp_bpf_push is the code used by BPF to submit messages into the TCP
> stack.
> 
>  if (flags & MSG_SENDPAGE_NOTLAST)
>      msghdr.msg_flags | MSG_MORE;
> 
> In the second patch,
> 
> f8dd95b29d7e ("tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage")
> 
> this logic was further changed to,
> 
>   if (flags & MSG_SENDPAGE_NOTLAST)
>      msghdr.msg_flags |= MSG_MORE
> 
> This was done as part of an improvement to use the sendmsg() callbacks
> and remove the sendpage usage inside the various sub systems.
> 
> However, these two patches together fixed a bug. The issue is without
> MSG_MORE set we will break a msg up into many smaller sends. In some
> case a lot because the operation loops over the scatter gather list.
> Without the MSG_MORE set (the current 6.1 case) we see stalls in data
> send/recv and sometimes applications failing to receive data. This
> generally is the result of an application that gives up after calling
> recv() or similar too many times. We introduce this because of how
> we incorrectly change the TCP send pattern.
> 
> Now that we have both 6.5 and 6.1 stable kernels deployed we've
> observed a series of issues related to this in real deployments. In 6.5
> kernels all the HTTP and other compliance tests pass and we are not
> observing any other issues. On 6.1 various compliance tests fail
> (nginx for example), but more importantly in these clusters without
> the flag set we observe stalled applications and increased retries in
> other applications. Openssl users where we have annotations to monitor
> retries and failures observed a significant increase in retries for
> example.
> 
> For the backport we isolated the fix to the two lines in the above
> patches that fixed the code. With this patch we deployed the workloads
> again and error rates and stalls went away and 6.1 stable kernels
> perform similar to 6.5 stable kernels. Similarly the compliance tests
> also passed.

Can we just take the two original patches instead?

thanks,

greg k-h