On Tue, Nov 22, 2022 at 10:01 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > On Mon, Nov 21, 2022 at 4:53 PM Ivan Babrou <ivan@xxxxxxxxxxxxxx> wrote: > > > > Hello, > > > > We have observed a negative TCP throughput behavior from the following commit: > > > > * 8e8ae645249b mm: memcontrol: hook up vmpressure to socket pressure > > > > It landed back in 2016 in v4.5, so it's not exactly a new issue. > > > > The crux of the issue is that in some cases with swap present the > > workload can be unfairly throttled in terms of TCP throughput. > > I guess defining 'fairness' in such a scenario is nearly impossible. > > Have you tried changing /proc/sys/net/ipv4/tcp_rmem (and/or tcp_wmem) ? > Defaults are quite conservative. Yes, our max sizes are much higher than the defaults. I don't see how it matters though. The issue is that the kernel clamps rcv_sshtrehsh at 4 x advmss. No matter how much TCP memory you end up using, the kernel will clamp based on responsiveness to memory reclaim, which in turn depends on swap presence. We're seeing it in production with tens of thousands of sockets and high max tcp_rmem and I'm able to replicate the same issue in my vm with the default sysctl values. > If for your workload you want to ensure a minimum amount of memory per > TCP socket, > that might be good enough. That's not my goal at all. We don't have a problem with TCP memory consumption. Our issue is low throughput because vmpressure() thinks that the cgroup is memory constrained when it most definitely is not.