On Tue, Sep 04, 2018 at 10:52:46AM -0700, Roman Gushchin wrote: > Reparenting of all pages is definitely an option to consider, Reparenting pages would be great indeed, but I'm not sure we could do that, because we'd have to walk over page lists of semi-active kmem caches and do it consistently while some pages may be freed as we go. Kmem caches are so optimized for performance that implementing such a procedure without impacting any hot paths would be nearly impossible IMHO. And there are two implementations (SLAB/SLUB), both of which we'd have to support. > but it's not free in any case, so if there is no problem, > why should we? Let's keep it as a last measure. In my case, > the proposed patch works perfectly: the number of dying cgroups > jumps around 100, where it grew steadily to 2k and more before. > > I believe that reparenting of LRU lists is required to minimize > the number of LRU lists to scan, but I'm not sure. AFAIR the sole purpose of LRU reparenting is releasing kmemcg_id as soon as a cgroup directory is deleted. If we didn't do that, dead cgroups would occupy slots in per memcg arrays (list_lru, kmem_cache) so if we had say 10K dead cgroups, we'd have to allocate 80 KB arrays to store per memcg data for each kmem_cache and list_lru. Back when kmem accounting was introduced, we used kmalloc() for allocating those arrays so growing the size up to 80 KB would result in getting ENOMEM when trying to create a cgroup too often. Now, we fall back on vmalloc() so may be it wouldn't be a problem... Alternatively, I guess we could "reparent" those dangling LRU objects not to the parent cgroup's list_lru_memcg, but instead to a special list_lru_memcg which wouldn't be assigned to any cgroup and which would be reclaimed ASAP on both global or memcg pressure.