Hi Zhaoyang, On Fri, Jul 27, 2018 at 05:19:25PM +0800, Zhaoyang Huang wrote: > This patch try to let the direct reclaim finish earlier than it used > to be. The problem comes from We observing that the direct reclaim > took a long time to finish when memcg is enabled. By debugging, we > find that the reason is the softlimit is too low to meet the loop > end criteria. So we add two barriers to judge if it has reclaimed > enough memory as same criteria as it is in shrink_lruvec: > 1. for each memcg softlimit reclaim. > 2. before starting the global reclaim in shrink_zone. Yes, the soft limit reclaim cycle is fairly aggressive and can introduce quite some allocation latency into the system. Let me say right up front, though, that we've spend hours in conference sessions and phone calls trying to fix this and could never agree on anything. You might have better luck trying cgroup2 which implements memory.low in a more scalable manner. (Due to the default value of 0 instead of infinitity, it can use a smoother 2-pass reclaim cycle.) On your patch specifically: should_continue_reclaim() is for compacting higher order pages. It assumes you have already made a full reclaim cycle and returns false for most allocations without checking any sort of reclaim progress. You may end up in a situation where soft limit reclaim finds nothing, and you still abort without trying a regular reclaim cycle. That can trigger the OOM killer while there is still plenty of reclaimable memory in other groups. So if you want to fix this, you'd have to look for a different threshold for soft limit reclaim and. Maybe something like this already works: diff --git a/mm/vmscan.c b/mm/vmscan.c index ee91e8cbeb5a..5b2388fa6bc4 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2786,7 +2786,8 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) &nr_soft_scanned); sc->nr_reclaimed += nr_soft_reclaimed; sc->nr_scanned += nr_soft_scanned; - /* need some check for avoid more shrink_zone() */ + if (nr_soft_reclaimed) + continue; } /* See comment about same check for global reclaim above */