On Fri, Jun 9, 2023 at 2:07 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > On Fri, Jun 9, 2023 at 10:28 AM Abel Wu <wuyun.abel@xxxxxxxxxxxxx> wrote: > > > > This is just a PoC patch intended to resume the discussion about > > tcpmem isolation opened by Google in LPC'22 [1]. > > > > We are facing the same problem that the global shared threshold can > > cause isolation issues. Low priority jobs can hog TCP memory and > > adversely impact higher priority jobs. What's worse is that these > > low priority jobs usually have smaller cpu weights leading to poor > > ability to consume rx data. > > > > To tackle this problem, an interface for non-root cgroup memory > > controller named 'socket.urgent' is proposed. It determines whether > > the sockets of this cgroup and its descendants can escape from the > > constrains or not under global socket memory pressure. > > > > The 'urgent' semantics will not take effect under memcg pressure in > > order to protect against worse memstalls, thus will be the same as > > before without this patch. > > > > This proposal doesn't remove protocal's threshold as we found it > > useful in restraining memory defragment. As aforementioned the low > > priority jobs can hog lots of memory, which is unreclaimable and > > unmovable, for some time due to small cpu weight. > > > > So in practice we allow high priority jobs with net-memcg accounting > > enabled to escape the global constrains if the net-memcg itselt is > > not under pressure. While for lower priority jobs, the budget will > > be tightened as the memory usage of 'urgent' jobs increases. In this > > way we can finally achieve: > > > > - Important jobs won't be priority inversed by the background > > jobs in terms of socket memory pressure/limit. > > > > - Global constrains are still effective, but only on non-urgent > > jobs, useful for admins on policy decision on defrag. > > > > Comments/Ideas are welcomed, thanks! > > > > This seems to go in a complete opposite direction than memcg promises. > > Can we fix memcg, so that : > > Each group can use the memory it was provisioned (this includes TCP buffers) > > Global tcp_memory can disappear (set tcp_mem to infinity) I agree with Eric and this is exactly how we at Google overcome the isolation issue. We have set tcp_mem to unlimited and enabled memcg accounting of network memory (by surgically incorporating v2 semantics of network memory accounting in our v1 environment). I do have one question though: > This proposal doesn't remove protocal's threshold as we found it > useful in restraining memory defragment. Can you explain how you find the global tcp limit useful? What does memory defragment mean?