RE: [PATCH net-next,v2] tcp: Set pingpong threshold via sysctl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx>
> Sent: Tuesday, October 10, 2023 4:12 PM
> To: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> Cc: corbet@xxxxxxx; davem@xxxxxxxxxxxxx; dsahern@xxxxxxxxxx;
> edumazet@xxxxxxxxxx; kuba@xxxxxxxxxx; kuniyu@xxxxxxxxxx; KY
> Srinivasan <kys@xxxxxxxxxxxxx>; linux-doc@xxxxxxxxxxxxxxx; linux-
> hyperv@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> mfreemon@xxxxxxxxxxxxxx; morleyd@xxxxxxxxxx; mubashirq@xxxxxxxxxx;
> ncardwell@xxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; pabeni@xxxxxxxxxx;
> weiwan@xxxxxxxxxx; ycheng@xxxxxxxxxx
> Subject: Re: [PATCH net-next,v2] tcp: Set pingpong threshold via sysctl
> 
> [You don't often get email from kuniyu@xxxxxxxxxx. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> From: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> Date: Tue, 10 Oct 2023 12:23:30 -0700
> > TCP pingpong threshold is 1 by default. But some applications, like SQL DB
> > may prefer a higher pingpong threshold to activate delayed acks in quick
> > ack mode for better performance.
> >
> > The pingpong threshold and related code were changed to 3 in the year
> > 2019 in:
> >   commit 4a41f453bedf ("tcp: change pingpong threshold to 3")
> > And reverted to 1 in the year 2022 in:
> >   commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"")
> >
> > There is no single value that fits all applications.
> > Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for
> > optimal performance based on the application needs.
> >
> > Signed-off-by: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> > ---
> > v2: Make it per-namesapce setting, and other updates suggested by Neal
> Cardwell,
> > and Kuniyuki Iwashima.
> >
> > ---
> >  Documentation/networking/ip-sysctl.rst |  8 ++++++++
> >  include/net/inet_connection_sock.h     | 16 ++++++++++++----
> >  include/net/netns/ipv4.h               |  1 +
> >  net/ipv4/sysctl_net_ipv4.c             |  8 ++++++++
> >  net/ipv4/tcp_ipv4.c                    |  2 ++
> >  net/ipv4/tcp_output.c                  |  4 ++--
> >  6 files changed, 33 insertions(+), 6 deletions(-)
> >
> > diff --git a/Documentation/networking/ip-sysctl.rst
> b/Documentation/networking/ip-sysctl.rst
> > index 5bfa1837968c..c0308b65dc2f 100644
> > --- a/Documentation/networking/ip-sysctl.rst
> > +++ b/Documentation/networking/ip-sysctl.rst
> > @@ -1183,6 +1183,14 @@ tcp_plb_cong_thresh - INTEGER
> >
> >       Default: 128
> >
> > +tcp_pingpong_thresh - INTEGER
> > +     TCP pingpong threshold is 1 by default, but some application may need a
> > +     higher threshold for optimal performance.
> > +
> > +     Possible Values: 1 - 255
> > +
> > +     Default: 1
> > +
> >  UDP variables
> >  =============
> >
> > diff --git a/include/net/inet_connection_sock.h
> b/include/net/inet_connection_sock.h
> > index 5d2fcc137b88..0182f27bce40 100644
> > --- a/include/net/inet_connection_sock.h
> > +++ b/include/net/inet_connection_sock.h
> > @@ -325,11 +325,10 @@ void inet_csk_update_fastreuse(struct
> inet_bind_bucket *tb,
> >
> >  struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu);
> >
> > -#define TCP_PINGPONG_THRESH  1
> > -
> >  static inline void inet_csk_enter_pingpong_mode(struct sock *sk)
> >  {
> > -     inet_csk(sk)->icsk_ack.pingpong = TCP_PINGPONG_THRESH;
> > +     inet_csk(sk)->icsk_ack.pingpong =
> > +             READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pingpong_thresh);
> >  }
> >
> >  static inline void inet_csk_exit_pingpong_mode(struct sock *sk)
> > @@ -339,7 +338,16 @@ static inline void
> inet_csk_exit_pingpong_mode(struct sock *sk)
> >
> >  static inline bool inet_csk_in_pingpong_mode(struct sock *sk)
> >  {
> > -     return inet_csk(sk)->icsk_ack.pingpong >= TCP_PINGPONG_THRESH;
> > +     return inet_csk(sk)->icsk_ack.pingpong >=
> > +            READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pingpong_thresh);
> > +}
> > +
> > +static inline void inet_csk_inc_pingpong_cnt(struct sock *sk)
> > +{
> > +     struct inet_connection_sock *icsk = inet_csk(sk);
> > +
> > +     if (icsk->icsk_ack.pingpong < U8_MAX)
> > +             icsk->icsk_ack.pingpong++;
> >  }
> >
> >  static inline bool inet_csk_has_ulp(const struct sock *sk)
> > diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> > index d96d05b08819..9f1b3eb9473e 100644
> > --- a/include/net/netns/ipv4.h
> > +++ b/include/net/netns/ipv4.h
> > @@ -191,6 +191,7 @@ struct netns_ipv4 {
> >       u8 sysctl_tcp_plb_rehash_rounds;
> >       u8 sysctl_tcp_plb_suspend_rto_sec;
> >       int sysctl_tcp_plb_cong_thresh;
> > +     u8 sysctl_tcp_pingpong_thresh;
> >
> >       int sysctl_udp_wmem_min;
> >       int sysctl_udp_rmem_min;
> 
> Maybe a hole after sysctl_tcp_backlog_ack_defer is a good place
> to put a new TCP knob.
> 
> After sysctl_tcp_plb_cong_thresh, we can fill 1-byte hole but the
> cacheline seems cold for TCP.
> 
> $ pahole -C netns_ipv4 vmlinux
> struct netns_ipv4 {
> ...
>         u8                         sysctl_tcp_backlog_ack_defer; /*   402     1 */
> 
>         /* XXX 1 byte hole, try to pack */
> 
>         int                        sysctl_tcp_reordering; /*   404     4 */
> ...
>         int                        sysctl_tcp_plb_cong_thresh; /*   572     4 */
>         /* --- cacheline 9 boundary (576 bytes) --- */
>         int                        sysctl_udp_wmem_min;  /*   576     4 */
>         int                        sysctl_udp_rmem_min;  /*   580     4 */
>         u8                         sysctl_fib_notify_on_flag_change; /*   584     1 */
>         u8                         sysctl_tcp_syn_linear_timeouts; /*   585     1 */
>         u8                         sysctl_igmp_llm_reports; /*   586     1 */
> 
>         /* XXX 1 byte hole, try to pack */
> ...
> 

Will do.

Thanks,
- Haiyang





[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux