On Fri, Sep 01, 2023 at 02:21:28PM +0800, Abel Wu wrote: > A socket is pressure-aware when its protocol has pressure defined, that > is sk_has_memory_pressure(sk) != NULL, e.g. TCP. These protocols might > want to limit the usage of socket memory depending on both the state of > global & memcg pressure through sk_under_memory_pressure(sk). > > While for allocation, memcg pressure will be simply ignored when usage > is under global limit (sysctl_mem[0]). This behavior has different impacts > on different cgroup modes. In cgroupv2 socket and other purposes share a > same memory limit, thus allowing sockmem to burst under memcg reclaiming > pressure could lead to longer stall, sometimes even OOM. While cgroupv1 > has no such worries. > > As a cloud service provider, we encountered a problem in our production > environment during the transition from cgroup v1 to v2 (partly due to the > heavy taxes of accounting socket memory in v1). Say one workload behaves > fine in cgroupv1 with memcg limit configured to 10GB memory and another > 1GB tcpmem, but will suck (or even be OOM-killed) in v2 with 11GB memory > due to burst memory usage on socket, since there is no specific limit for > socket memory in cgroupv2 and relies largely on workloads doing traffic > control themselves. > > It's rational for the workloads to build some traffic control to better > utilize the resources they bought, but from kernel's point of view it's > also reasonable to suppress the allocation of socket memory once there is > a shortage of free memory, given that performance degradation is better > than failure. > > As per the above, this patch aims to be more conservative on allocation > for the pressure-aware sockets under global and/or memcg pressure. While > OTOH throttling on incoming traffic could hurt latency badly possibly > due to SACKed segs get dropped from the OFO queue. See a related commit > 720ca52bcef22 ("net-memcg: avoid stalls when under memory pressure"). > This patch preserves this decision by throttling RX allocation only at > critical pressure level when it hardly makes sense to continue receive > data. > > No functional change intended for pressure-unaware protocols. > > Signed-off-by: Abel Wu <wuyun.abel@xxxxxxxxxxxxx> ... > @@ -3087,8 +3100,20 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind) > if (sk_has_memory_pressure(sk)) { > u64 alloc; > > - if (!sk_under_memory_pressure(sk)) > + /* Be more conservative if the socket's memcg (or its > + * parents) is under reclaim pressure, try to possibly > + * avoid further memstall. > + */ > + if (under_memcg_pressure) > + goto suppress_allocation; > + > + if (!sk_under_global_memory_pressure(sk)) > return 1; > + > + /* Trying to be fair among all the sockets of same > + * protocal under global memory pressure, by allowing nit: checkpatch.pl --codespell says, protocal -> protocol > + * the ones that under average usage to raise. > + */ > alloc = sk_sockets_allocated_read_positive(sk); > if (sk_prot_mem_limits(sk, 2) > alloc * > sk_mem_pages(sk->sk_wmem_queued + > -- > 2.37.3 > >