On Fri, Apr 17, 2020 at 10:36 AM Tejun Heo <tj@xxxxxxxxxx> wrote: > > Hello, > > On Fri, Apr 17, 2020 at 10:18:15AM -0700, Shakeel Butt wrote: > > > There currently are issues with anonymous memory management which makes them > > > different / worse than page cache but I don't follow why swapping > > > necessarily means that isolation is broken. Page refaults don't indicate > > > that memory isolation is broken after all. > > > > Sorry, I meant the performance isolation. Direct reclaim does not > > really differentiate who to stall and whose CPU to use. > > Can you please elaborate concrete scenarios? I'm having a hard time seeing > differences from page cache. > Oh I was talking about the global reclaim here. In global reclaim, any task can be throttled (throttle_direct_reclaim()). Memory freed by using the CPU of high priority low latency jobs can be stolen by low priority batch jobs. > > > > memcg limit reclaim and memcg limits are overcommitted. Shouldn't > > > > running out of swap will trigger the OOM earlier which should be > > > > better than impacting the whole system. > > > > > > The primary scenario which was being considered was undercommitted > > > protections but I don't think that makes any relevant differences. > > > > > > > What is undercommitted protections? Does it mean there is still swap > > available on the system but the memcg is hitting its swap limit? > > Hahaha, I assumed you were talking about memory.high/max and was saying that > the primary scenarios that were being considered was usage of memory.low > interacting with swap. Again, can you please give an concrete example so > that we don't misunderstand each other? > > > > This is exactly similar to delay injection for memory.high. What's desired > > > is slowing down the workload as the available resource is depleted so that > > > the resource shortage presents as gradual degradation of performance and > > > matching increase in resource PSI. This allows the situation to be detected > > > and handled from userland while avoiding sudden and unpredictable behavior > > > changes. > > > > > > > Let me try to understand this with an example. Memcg 'A' has > > Ah, you already went there. Great. > > > memory.high = 100 MiB, memory.max = 150 MiB and memory.swap.max = 50 > > MiB. When A's usage goes over 100 MiB, it will reclaim the anon, file > > and kmem. The anon will go to swap and increase its swap usage until > > it hits the limit. Now the 'A' reclaim_high has fewer things (file & > > kmem) to reclaim but the mem_cgroup_handle_over_high() will keep A's > > increase in usage in check. > > > > So, my question is: should the slowdown by memory.high depends on the > > reclaimable memory? If there is no reclaimable memory and the job hits > > memory.high, should the kernel slow it down to crawl until the PSI > > monitor comes and decides what to do. If I understand correctly, the > > problem is the kernel slow down is not successful when reclaimable > > memory is very low. Please correct me if I am wrong. > > In combination with memory.high, swap slowdown may not be necessary because > memory.high's slow down mechanism is already there to handle "can't swap" > scenario whether that's because swap is disabled wholesale, limited or > depleted. However, please consider the following scenario. > > cgroup A has memory.low protection and no other restrictions. cgroup B has > no protection and has access to swap. When B's memory starts bloating and > gets the system under memory contention, it'll start consuming swap until it > can't. When swap becomes depleted for B, there's nothing holding it back and > B will start eating into A's protection. > In this example does 'B' have memory.high and memory.max set and by A having no other restrictions, I am assuming you meant unlimited high and max for A? Can 'A' use memory.min? > The proposed mechanism just plugs another vector for the same condition > where anonymous memory management breaks down because they can no longer be > reclaimed due to swap unavailability. > > Thanks. > > -- > tejun