On Thu, Mar 22, 2018 at 07:52:37PM +0300, Kirill Tkhai wrote: > Here is the problem I'm solving: https://lkml.org/lkml/2018/3/21/365. Oh, finally you tell me what the problem is that you're trying to solve. I *asked this several times* and got no response. Thank you for wasting so much of my time. > Current shrinker is not scalable. Then there are many memcg and mounts, > the time of iteration shrink_slab() in case of global reclaim can > take much time. There is times of shrink_slab() by the link. A node > with 200 containers may waste 4 seconds on global reclaim just to > iterate over all shrinkers for all cgroups, call shrinker::count_objects() > and receive 0 zero objects. So, your problem is the way the memcgs were tacked onto the side of the list_lru infrastructure and are iterated, which has basically nothing to do with the way the low level XFS inode shrinker behaves. /me looks at the patches /me shudders at the thought of external "cache has freeable items" control for the shrinking of vfs caches. Biggest problem I see with this is the scope for coherency bugs ini the "memcg shrinker has freeable items" tracking. If that happens, there's no way of getting that memcg to run it's shrinker ever again. That seems very, very fragile and likely to be an endless source of OOM bugs. The whole point of the shrinker polling infrastructure is that it is not susceptible to this sort of bug. Really, the problem is that there's no separate list of memcg aware shrinkers, so every registered shrinker has to be iterated just to find the one shrinker that is memcg aware. So why not just do the simple thing which is create a separate "memcg-aware" shrinker list (i.e. create shrinker_list_memcg alongside shrinker_list) and only iterate the shrinker_list_memcg when a memcg is passed to shrink_slab()? That means we'll only run 2 shrinkers per memcg at most (sueprblock and working set) per memcg reclaim call. That's a simple 10-20 line change, not a whole mess of new code and issues... > Can't we call shrink of shared objects only for top memcg? Something like > this: What's a "shared object", and how is it any different to a normal slab cache object? CHeers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html