On Fri, Feb 22, 2019 at 08:58:25PM +0300, Andrey Ryabinin wrote: > In a presence of more than 1 memory cgroup in the system our reclaim > logic is just suck. When we hit memory limit (global or a limit on > cgroup with subgroups) we reclaim some memory from all cgroups. > This is sucks because, the cgroup that allocates more often always wins. > E.g. job that allocates a lot of clean rarely used page cache will push > out of memory other jobs with active relatively small all in memory > working set. > > To prevent such situations we have memcg controls like low/max, etc which > are supposed to protect jobs or limit them so they to not hurt others. > But memory cgroups are very hard to configure right because it requires > precise knowledge of the workload which may vary during the execution. > E.g. setting memory limit means that job won't be able to use all memory > in the system for page cache even if the rest the system is idle. > Basically our current scheme requires to configure every single cgroup > in the system. > > I think we can do better. The idea proposed by this patch is to reclaim > only inactive pages and only from cgroups that have big > (!inactive_is_low()) inactive list. And go back to shrinking active lists > only if all inactive lists are low. Hi Andrey! It's definitely an interesting idea! However, let me bring some concerns: 1) What's considered active and inactive depends on memory pressure inside a cgroup. Actually active pages in one cgroup (e.g. just deleted) can be colder than inactive pages in an other (e.g. a memory-hungry cgroup with a tight memory.max). Also a workload inside a cgroup can to some extend control what's going to the active LRU. So it opens a way to get more memory unfairly by artificially promoting more pages to the active LRU. So a cgroup can get an unfair advantage over other cgroups. Generally speaking, now we have a way to measure the memory pressure inside a cgroup. So, in theory, it should be possible to balance scanning effort based on memory pressure. Thanks!