On Tue 23-01-24 11:48:19, Johannes Weiner wrote: > The revert isn't a straight-forward solution. > > The patch you're reverting fixed conventional reclaim and broke > MGLRU. Your revert fixes MGLRU and breaks conventional reclaim. > > On Tue, Jan 23, 2024 at 05:58:05AM -0800, T.J. Mercier wrote: > > They both are able to make progress. The main difference is that a > > single iteration of try_to_free_mem_cgroup_pages with MGLRU ends soon > > after it reclaims nr_to_reclaim, and before it touches all memcgs. So > > a single iteration really will reclaim only about SWAP_CLUSTER_MAX-ish > > pages with MGLRU. WIthout MGLRU the memcg walk is not aborted > > immediately after nr_to_reclaim is reached, so a single call to > > try_to_free_mem_cgroup_pages can actually reclaim thousands of pages > > even when sc->nr_to_reclaim is 32. (I.E. MGLRU overreclaims less.) > > https://lore.kernel.org/lkml/20221201223923.873696-1-yuzhao@xxxxxxxxxx/ > > Is that a feature or a bug? > > * 1. Memcg LRU only applies to global reclaim, and the round-robin incrementing > * of their max_seq counters ensures the eventual fairness to all eligible > * memcgs. For memcg reclaim, it still relies on mem_cgroup_iter(). > > If it bails out exactly after nr_to_reclaim, it'll overreclaim > less. But with steady reclaim in a complex subtree, it will always hit > the first cgroup returned by mem_cgroup_iter() and then bail. This > seems like a fairness issue. Agreed. We would need to re-introduce something like we used to have before 1ba6fc9af35bf. > We should figure out what the right method for balancing fairness with > overreclaim is, regardless of reclaim implementation. Because having > two different approaches and reverting dependent things back and forth > doesn't make sense. Absolutely agreed! > Using an LRU to rotate through memcgs over multiple reclaim cycles > seems like a good idea. Why is this specific to MGLRU? Shouldn't this > be a generic piece of memcg infrastructure? > > Then there is the question of why there is an LRU for global reclaim, > but not for subtree reclaim. Reclaiming a container with multiple > subtrees would benefit from the fairness provided by a container-level > LRU order just as much; having fairness for root but not for subtrees > would produce different reclaim and pressure behavior, and can cause > regressions when moving a service from bare-metal into a container. > > Figuring out these differences and converging on a method for cgroup > fairness would be the better way of fixing this. Because of the > regression risk to the default reclaim implementation, I'm inclined to > NAK this revert. I do agree that a simple revert doesn't seem to be the way to go. -- Michal Hocko SUSE Labs