On Thu, Apr 12, 2012 at 09:45:47AM -0700, Ying Han wrote: > On Thu, Apr 12, 2012 at 7:24 AM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Wed, Apr 11, 2012 at 09:06:27PM -0700, Ying Han wrote: > >> On Wed, Apr 11, 2012 at 4:56 PM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > >> > On Wed, Apr 11, 2012 at 03:00:27PM -0700, Ying Han wrote: > >> >> Under global background reclaim, the sc->nr_to_reclaim is set to > >> >> ULONG_MAX. Now we are iterating all memcgs under the zone and we > >> >> shouldn't pass the pressure from kswapd for each memcg. > >> >> > >> >> After all, the balance_pgdat() breaks after reclaiming SWAP_CLUSTER_MAX > >> >> pages to prevent building up reclaim priorities. > >> > > >> > shrink_mem_cgroup_zone() bails out of a zone, balance_pgdat() bails > >> > out of a priority loop, there is quite a difference. > >> > > >> > After this patch, kswapd no longer puts equal pressure on all zones in > >> > the zonelist, which was a key reason why we could justify bailing > >> > early out of individual zones in direct reclaim: kswapd will restore > >> > fairness. > >> > >> Guess I see your point here. > >> > >> My intention is to prevent over-reclaim memcgs per-zone by having > >> nr_to_reclaim to ULONG_MAX. Now, we scan each memcg based on > >> get_scan_count() without bailout, do you see a problem w/o this patch? > > > > The fact that we iterate over each memcg does not make a difference, > > because the target that get_scan_count() returns for each zone-memcg > > is in sum what it would have returned for the whole zone, so the scan > > aggressiveness did not increase. It just distributes the zone's scan > > target over the set of memcgs proportional to their share of pages in > > that zone. > > > > So I have trouble deciding what's right. > > > > On the one hand, I don't see why you bother with this patch, because > > you don't increase the risk of overreclaim. Michal's concern for > > overreclaim came from the fact that I had kswapd do soft limit reclaim > > at priority 0 without ever bailing from individual zones. But your > > soft limit implementation is purely about selecting memcgs to reclaim, > > you never increase the pressure put on a memcg anywhere. > > I agree w/ you here. > > > > > On the other hand, I don't even agree with that aspect of your series; > > that you no longer prioritize explicitely soft-limited groups in > > excess over unconfigured groups, as I mentioned in the other mail. > > But if you did, you would likely need a patch like this, I think. > > Prioritize between memcg w/ default softlimit (0) and memcg exceeds > non-default softlimit (x) ? Yup: A ( soft = default, usage = 10 ) B ( soft = 8, usage = 10 ) This is the "memory-nice this one workload" I was referring to in the other mail. It would have reclaimed B more aggressively than A in the past. After your patch, they will both be reclaimed equally, because you change the default from "below softlimit" to "above soft limit". > Are you referring to the balance the reclaim between eligible memcgs > based on different factors like softlimit_exceed, recent_scanned, > recent_reclaimed....? If so, I am planning to make that as second step > after this patch series. Well, humm. You potentially break existing setups. It would be good not to do that, even just temporarily. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>