> -----Original Message----- > From: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> > Sent: Tuesday, October 10, 2023 4:12 PM > To: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> > Cc: corbet@xxxxxxx; davem@xxxxxxxxxxxxx; dsahern@xxxxxxxxxx; > edumazet@xxxxxxxxxx; kuba@xxxxxxxxxx; kuniyu@xxxxxxxxxx; KY > Srinivasan <kys@xxxxxxxxxxxxx>; linux-doc@xxxxxxxxxxxxxxx; linux- > hyperv@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > mfreemon@xxxxxxxxxxxxxx; morleyd@xxxxxxxxxx; mubashirq@xxxxxxxxxx; > ncardwell@xxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; pabeni@xxxxxxxxxx; > weiwan@xxxxxxxxxx; ycheng@xxxxxxxxxx > Subject: Re: [PATCH net-next,v2] tcp: Set pingpong threshold via sysctl > > [You don't often get email from kuniyu@xxxxxxxxxx. Learn why this is > important at https://aka.ms/LearnAboutSenderIdentification ] > > From: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> > Date: Tue, 10 Oct 2023 12:23:30 -0700 > > TCP pingpong threshold is 1 by default. But some applications, like SQL DB > > may prefer a higher pingpong threshold to activate delayed acks in quick > > ack mode for better performance. > > > > The pingpong threshold and related code were changed to 3 in the year > > 2019 in: > > commit 4a41f453bedf ("tcp: change pingpong threshold to 3") > > And reverted to 1 in the year 2022 in: > > commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"") > > > > There is no single value that fits all applications. > > Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for > > optimal performance based on the application needs. > > > > Signed-off-by: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> > > --- > > v2: Make it per-namesapce setting, and other updates suggested by Neal > Cardwell, > > and Kuniyuki Iwashima. > > > > --- > > Documentation/networking/ip-sysctl.rst | 8 ++++++++ > > include/net/inet_connection_sock.h | 16 ++++++++++++---- > > include/net/netns/ipv4.h | 1 + > > net/ipv4/sysctl_net_ipv4.c | 8 ++++++++ > > net/ipv4/tcp_ipv4.c | 2 ++ > > net/ipv4/tcp_output.c | 4 ++-- > > 6 files changed, 33 insertions(+), 6 deletions(-) > > > > diff --git a/Documentation/networking/ip-sysctl.rst > b/Documentation/networking/ip-sysctl.rst > > index 5bfa1837968c..c0308b65dc2f 100644 > > --- a/Documentation/networking/ip-sysctl.rst > > +++ b/Documentation/networking/ip-sysctl.rst > > @@ -1183,6 +1183,14 @@ tcp_plb_cong_thresh - INTEGER > > > > Default: 128 > > > > +tcp_pingpong_thresh - INTEGER > > + TCP pingpong threshold is 1 by default, but some application may need a > > + higher threshold for optimal performance. > > + > > + Possible Values: 1 - 255 > > + > > + Default: 1 > > + > > UDP variables > > ============= > > > > diff --git a/include/net/inet_connection_sock.h > b/include/net/inet_connection_sock.h > > index 5d2fcc137b88..0182f27bce40 100644 > > --- a/include/net/inet_connection_sock.h > > +++ b/include/net/inet_connection_sock.h > > @@ -325,11 +325,10 @@ void inet_csk_update_fastreuse(struct > inet_bind_bucket *tb, > > > > struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu); > > > > -#define TCP_PINGPONG_THRESH 1 > > - > > static inline void inet_csk_enter_pingpong_mode(struct sock *sk) > > { > > - inet_csk(sk)->icsk_ack.pingpong = TCP_PINGPONG_THRESH; > > + inet_csk(sk)->icsk_ack.pingpong = > > + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pingpong_thresh); > > } > > > > static inline void inet_csk_exit_pingpong_mode(struct sock *sk) > > @@ -339,7 +338,16 @@ static inline void > inet_csk_exit_pingpong_mode(struct sock *sk) > > > > static inline bool inet_csk_in_pingpong_mode(struct sock *sk) > > { > > - return inet_csk(sk)->icsk_ack.pingpong >= TCP_PINGPONG_THRESH; > > + return inet_csk(sk)->icsk_ack.pingpong >= > > + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pingpong_thresh); > > +} > > + > > +static inline void inet_csk_inc_pingpong_cnt(struct sock *sk) > > +{ > > + struct inet_connection_sock *icsk = inet_csk(sk); > > + > > + if (icsk->icsk_ack.pingpong < U8_MAX) > > + icsk->icsk_ack.pingpong++; > > } > > > > static inline bool inet_csk_has_ulp(const struct sock *sk) > > diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h > > index d96d05b08819..9f1b3eb9473e 100644 > > --- a/include/net/netns/ipv4.h > > +++ b/include/net/netns/ipv4.h > > @@ -191,6 +191,7 @@ struct netns_ipv4 { > > u8 sysctl_tcp_plb_rehash_rounds; > > u8 sysctl_tcp_plb_suspend_rto_sec; > > int sysctl_tcp_plb_cong_thresh; > > + u8 sysctl_tcp_pingpong_thresh; > > > > int sysctl_udp_wmem_min; > > int sysctl_udp_rmem_min; > > Maybe a hole after sysctl_tcp_backlog_ack_defer is a good place > to put a new TCP knob. > > After sysctl_tcp_plb_cong_thresh, we can fill 1-byte hole but the > cacheline seems cold for TCP. > > $ pahole -C netns_ipv4 vmlinux > struct netns_ipv4 { > ... > u8 sysctl_tcp_backlog_ack_defer; /* 402 1 */ > > /* XXX 1 byte hole, try to pack */ > > int sysctl_tcp_reordering; /* 404 4 */ > ... > int sysctl_tcp_plb_cong_thresh; /* 572 4 */ > /* --- cacheline 9 boundary (576 bytes) --- */ > int sysctl_udp_wmem_min; /* 576 4 */ > int sysctl_udp_rmem_min; /* 580 4 */ > u8 sysctl_fib_notify_on_flag_change; /* 584 1 */ > u8 sysctl_tcp_syn_linear_timeouts; /* 585 1 */ > u8 sysctl_igmp_llm_reports; /* 586 1 */ > > /* XXX 1 byte hole, try to pack */ > ... > Will do. Thanks, - Haiyang