On Tue, Jun 8, 2021 at 4:26 AM Shuyi Cheng <chengshuyi@xxxxxxxxxxxxxxxxx> wrote: > > > > On 6/7/21 7:34 PM, Eric Dumazet wrote: > > On Mon, Jun 7, 2021 at 11:35 AM Shuyi Cheng > > <chengshuyi@xxxxxxxxxxxxxxxxx> wrote: > >> > >> When the lo network card is used for communication, the tcp server > >> reduces the size of the receiving buffer, causing the tcp client > >> to have a delay of 200ms. Examples are as follows: > >> > >> Suppose that the MTU of the lo network card is 65536, and the tcp server > >> sets the receive buffer size to 42KB. According to the > >> tcp_bound_to_half_wnd function, the MSS of the server and client is > >> 21KB. Then, the tcp server sets the buffer size of the connection to > >> 16KB. At this time, the MSS of the server is 8KB, and the MSS of the > >> client is still 21KB. But it will cause the client to fail to send the > >> message, that is, tcp_write_xmit fails. Mainly because tcp_snd_wnd_test > >> failed, and then entered the zero window detection phase, resulting in a > >> 200ms delay. > >> > >> Therefore, we mainly modify two places. One is the tcp_current_mss > >> function. When the sending window is smaller than the current mss, mss > >> needs to be updated. The other is the tcp_bound_to_half_wnd function. > >> When the sending window is smaller than the current mss, the mss value > >> should be calculated according to the current sending window, not > >> max_window. > >> > >> Signed-off-by: Shuyi Cheng <chengshuyi@xxxxxxxxxxxxxxxxx> > >> --- > >> include/net/tcp.h | 11 ++++++++--- > >> net/ipv4/tcp_output.c | 3 ++- > >> 2 files changed, 10 insertions(+), 4 deletions(-) > >> > >> diff --git a/include/net/tcp.h b/include/net/tcp.h > >> index e668f1b..fcdef16 100644 > >> --- a/include/net/tcp.h > >> +++ b/include/net/tcp.h > >> @@ -641,6 +641,11 @@ static inline void tcp_clear_xmit_timers(struct sock *sk) > >> static inline int tcp_bound_to_half_wnd(struct tcp_sock *tp, int pktsize) > >> { > >> int cutoff; > >> + int window; > >> + > >> + window = tp->max_window; > >> + if (tp->snd_wnd && tp->snd_wnd < pktsize) > >> + window = tp->snd_wnd; > >> > >> /* When peer uses tiny windows, there is no use in packetizing > >> * to sub-MSS pieces for the sake of SWS or making sure there > >> @@ -649,10 +654,10 @@ static inline int tcp_bound_to_half_wnd(struct tcp_sock *tp, int pktsize) > >> * On the other hand, for extremely large MSS devices, handling > >> * smaller than MSS windows in this way does make sense. > >> */ > >> - if (tp->max_window > TCP_MSS_DEFAULT) > >> - cutoff = (tp->max_window >> 1); > >> + if (window > TCP_MSS_DEFAULT) > >> + cutoff = (window >> 1); > >> else > >> - cutoff = tp->max_window; > >> + cutoff = window; > >> > >> if (cutoff && pktsize > cutoff) > >> return max_t(int, cutoff, 68U - tp->tcp_header_len); > >> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > >> index bde781f..88dcdf2 100644 > >> --- a/net/ipv4/tcp_output.c > >> +++ b/net/ipv4/tcp_output.c > >> @@ -1833,7 +1833,8 @@ unsigned int tcp_current_mss(struct sock *sk) > >> > >> if (dst) { > >> u32 mtu = dst_mtu(dst); > >> - if (mtu != inet_csk(sk)->icsk_pmtu_cookie) > >> + if (mtu != inet_csk(sk)->icsk_pmtu_cookie || > >> + (tp->snd_wnd && tp->snd_wnd < mss_now)) > >> mss_now = tcp_sync_mss(sk, mtu); > > > > > > I do not think we want to add code in fast path only to cope with > > pathological user choices. > > > > Thank you very much for your reply! > > I very much agree with your point of view. However, the kernel currently > accepts the unreasonable RCVBUF set by the developer, that is, the set > RCVBUF is smaller than MSS. Perhaps, we can avoid setting RCVBUF to be > smaller than MSS in the sock_setsockopt function. What do you think? I think this is not trivial to make the limit being MSS dependent, because SO_RCVBUF can be set before connection is attempted. (So the MSS is not yet known) I would rather simply add a sysctl to set a floor for TCP SO_RCVBUF, and default it to a reasonable value (128KB ?) The sysctl might be needed for some crazy environments (one million TCP flows on a host with only 8 GB of memory for example...) > > Thanks. > > > Maybe it is time to accept the fact that setting the receive buffer > > size to 42KB is dumb nowadays. > > > > TCP needs to have at least two buffers in flight to avoid 'strange' stalls. > > > > If loopback MTU is 64KB (and TSO/GRO sizes also can reach 64K), maybe > > we need to ensure a minimal rcvbuf. > > > > If applications set 42KB window size, they get what they asked for : > > damn slow TCP communications. > > > > We do not want to make them happy, while increasing cpu costs for 99% > > of other uses cases which are not trying to make TCP miserable. > > > > I would rather add a sysctl or something to ensure rcvbuf has a > > minimal sane value, instead of risking subtle TCP regressions. > > > > In 2021, we should not limit ourselves to memory constraints that were > > common 40 years ago when TCP was invented. > > > > References : > > commit 9eb5bf838d06aa6ddebe4aca6b5cedcf2eb53b86 net: sock: fix > > TCP_SKB_MIN_TRUESIZE > > commit eea86af6b1e18d6fa8dc959e3ddc0100f27aff9f net: sock: adapt > > SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF > >