The revert isn't a straight-forward solution. The patch you're reverting fixed conventional reclaim and broke MGLRU. Your revert fixes MGLRU and breaks conventional reclaim. On Tue, Jan 23, 2024 at 05:58:05AM -0800, T.J. Mercier wrote: > They both are able to make progress. The main difference is that a > single iteration of try_to_free_mem_cgroup_pages with MGLRU ends soon > after it reclaims nr_to_reclaim, and before it touches all memcgs. So > a single iteration really will reclaim only about SWAP_CLUSTER_MAX-ish > pages with MGLRU. WIthout MGLRU the memcg walk is not aborted > immediately after nr_to_reclaim is reached, so a single call to > try_to_free_mem_cgroup_pages can actually reclaim thousands of pages > even when sc->nr_to_reclaim is 32. (I.E. MGLRU overreclaims less.) > https://lore.kernel.org/lkml/20221201223923.873696-1-yuzhao@xxxxxxxxxx/ Is that a feature or a bug? * 1. Memcg LRU only applies to global reclaim, and the round-robin incrementing * of their max_seq counters ensures the eventual fairness to all eligible * memcgs. For memcg reclaim, it still relies on mem_cgroup_iter(). If it bails out exactly after nr_to_reclaim, it'll overreclaim less. But with steady reclaim in a complex subtree, it will always hit the first cgroup returned by mem_cgroup_iter() and then bail. This seems like a fairness issue. We should figure out what the right method for balancing fairness with overreclaim is, regardless of reclaim implementation. Because having two different approaches and reverting dependent things back and forth doesn't make sense. Using an LRU to rotate through memcgs over multiple reclaim cycles seems like a good idea. Why is this specific to MGLRU? Shouldn't this be a generic piece of memcg infrastructure? Then there is the question of why there is an LRU for global reclaim, but not for subtree reclaim. Reclaiming a container with multiple subtrees would benefit from the fairness provided by a container-level LRU order just as much; having fairness for root but not for subtrees would produce different reclaim and pressure behavior, and can cause regressions when moving a service from bare-metal into a container. Figuring out these differences and converging on a method for cgroup fairness would be the better way of fixing this. Because of the regression risk to the default reclaim implementation, I'm inclined to NAK this revert.