Sorry, once more, now with fsdevel@ in cc, asked by Dave. -- Recent reverts of memcg leak fixes [1, 2] reintroduced the problem with accumulating of dying memory cgroups. This is a serious problem: on most of our machines we've seen thousands on dying cgroups, and the corresponding memory footprint was measured in hundreds of megabytes. The problem was also independently discovered by other companies. The fixes were reverted due to xfs regression investigated by Dave Chinner. Simultaneously we've seen a very small (0.18%) cpu regression on some hosts, which caused Rik van Riel to propose a patch [3], which aimed to fix the regression. The idea is to accumulate small memory pressure and apply it periodically, so that we don't overscan small shrinker lists. According to Jan Kara's data [4], Rik's patch partially fixed the regression, but not entirely. The path forward isn't entirely clear now, and the status quo isn't acceptable due to memcg leak bug. Dave and Michal's position is to focus on dying memory cgroup case and apply some artificial memory pressure on corresponding slabs (probably, during cgroup deletion process). This approach can theoretically be less harmful for the subtle scanning balance, and not cause any regressions. In my opinion, it's not necessarily true. Slab objects can be shared between cgroups, and often can't be reclaimed on cgroup removal without an impact on the rest of the system. Applying constant artificial memory pressure precisely only on objects accounted to dying cgroups is challenging and will likely cause a quite significant overhead. Also, by "forgetting" of some slab objects under light or even moderate memory pressure, we're wasting memory, which can be used for something useful. Dying cgroups are just making this problem more obvious because of their size. So, using "natural" memory pressure in a way, that all slabs objects are scanned periodically, seems to me as the best solution. The devil is in details, and how to do it without causing any regressions, is an open question now. Also, completely re-parenting slabs to parent cgroup (not only shrinker lists) is a potential option to consider. It will be nice to discuss the problem on LSF/MM, agree on general path and make a potential list of benchmarks, which can be used to prove the solution. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a9a238e83fbb0df31c3b9b67003f8f9d1d1b6c96 [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=69056ee6a8a3d576ed31e38b3b14c70d6c74edcc [3] https://lkml.org/lkml/2019/1/28/1865 [4] https://lkml.org/lkml/2019/2/8/336