On Fri, Sep 7, 2018 at 12:03 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > Problem is : we have platforms with more than 100 cpus, and > sk_memory_allocated() cost will be too expensive, > especially if the host is under memory pressure, since all cpus will > touch their private counter. > > per cpu variables do not really scale, they were ok 10 years ago when > no more than 16 cpus were the norm. > > I would prefer change TCP to not aggressively call > __sk_mem_reduce_allocated() from tcp_write_timer() > > Ideally only tcp_retransmit_timer() should attempt to reduce forward > allocations, after recurring timeout. > > Note that after 20c64d5cd5a2bdcdc8982a06cb05e5e1bd851a3d ("net: avoid > sk_forward_alloc overflows") > we have better control over sockets having huge forward allocations. > > Something like : Or something less risky : diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 7fdf222a0bdfe9775970082f6b5dcdcc82b2ae1a..0aee80b6966cb2898e46350c761f9eb431ff1206 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -604,7 +604,8 @@ void tcp_write_timer_handler(struct sock *sk) } out: - sk_mem_reclaim(sk); + if (tcp_under_memory_pressure(sk)) + sk_mem_reclaim(sk); } static void tcp_write_timer(struct timer_list *t)