On Fri, Nov 02, 2018 at 09:03:55AM +0100, Michal Hocko wrote: > On Fri 02-11-18 02:45:42, Dexuan Cui wrote: > [...] > > I totally agree. I'm now just wondering if there is any temporary workaround, > > even if that means we have to run the kernel with some features disabled or > > with a suboptimal performance? > > One way would be to disable kmem accounting (cgroup.memory=nokmem kernel > option). That would reduce the memory isolation because quite a lot of > memory will not be accounted for but the primary source of in-flight and > hard to reclaim memory will be gone. In my experience disabling the kmem accounting doesn't really solve the issue (without patches), but can lower the rate of the leak. > > Another workaround could be to use force_empty knob we have in v1 and > use it when removing a cgroup. We do not have it in cgroup v2 though. > The file hasn't been added to v2 because we didn't really have any > proper usecase. Working around a bug doesn't sound like a _proper_ > usecase but I can imagine workloads that bring a lot of metadata objects > that are not really interesting for later use so something like a > targeted drop_caches... This can help a bit too, but even using the system-wide drop_caches knob unfortunately doesn't return all the memory back. Thanks!