On Wed, Sep 04, 2019 at 04:53:08PM +0300, Konstantin Khlebnikov wrote: > Currently mlock keeps pages in cgroups where they were accounted. > This way one container could affect another if they share file cache. > Typical case is writing (downloading) file in one container and then > locking in another. After that first container cannot get rid of cache. Yeah, it's a valid problem, and it's not about mlocked pages only, the same thing is true for generic pagecache. The only difference is that in theory memory pressure should fix everything. But in reality pagecache used by the second container can be very hot, so the first once can't really get rid of it. In other words, there is no way to pass a pagecache page between cgroups without evicting it and re-reading from a storage, which is sub-optimal in many cases. We thought about new madvise(), which will uncharge pagecache but set a new page flag, which will mean something like "whoever first starts using the page, should be charged for it". But it never materialized in a patchset. > Also removed cgroup stays pinned by these mlocked pages. Tbh, I don't think it's a big issue here. If only there is a huge number of 1-page sized mlock areas, but this seems to be unlikely. > > This patchset implements recharging pages to cgroup of mlock user. > > There are three cases: > * recharging at first mlock > * recharging at munlock to any remaining mlock > * recharging at 'culling' in reclaimer to any existing mlock > > To keep things simple recharging ignores memory limit. After that memory > usage temporary could be higher than limit but cgroup will reclaim memory > later or trigger oom, which is valid outcome when somebody mlock too much. OOM is a concern here. If quitting an application will cause an immediate OOM in an other cgroup, that's not so good. Ideally it should work like memory.high, forcing all threads in the second cgroup into direct reclaim. Thanks!