On Tue 10-01-12 16:02:52, Johannes Weiner wrote: > Right now, memcg soft limits are implemented by having a sorted tree > of memcgs that are in excess of their limits. Under global memory > pressure, kswapd first reclaims from the biggest excessor and then > proceeds to do regular global reclaim. The result of this is that > pages are reclaimed from all memcgs, but more scanning happens against > those above their soft limit. > > With global reclaim doing memcg-aware hierarchical reclaim by default, > this is a lot easier to implement: everytime a memcg is reclaimed > from, scan more aggressively (per tradition with a priority of 0) if > it's above its soft limit. With the same end result of scanning > everybody, but soft limit excessors a bit more. > > Advantages: > > o smoother reclaim: soft limit reclaim is a separate stage before > global reclaim, whose result is not communicated down the line and > so overreclaim of the groups in excess is very likely. After this > patch, soft limit reclaim is fully integrated into regular reclaim > and each memcg is considered exactly once per cycle. > > o true hierarchy support: soft limits are only considered when > kswapd does global reclaim, but after this patch, targetted > reclaim of a memcg will mind the soft limit settings of its child > groups. Yes it makes sense. At first I was thinking that soft limit should be considered only under global mem. pressure (at least documentation says so) but now it makes sense. We can push on over-soft limit groups more because they told us they could sacrifice something... Anyway documentation needs an update as well. But we have to be little bit careful here. I am still quite confuses how we should handle hierarchies vs. subtrees. See bellow. > > o code size: soft limit reclaim requires a lot of code to maintain > the per-node per-zone rb-trees to quickly find the biggest > offender, dedicated paths for soft limit reclaim etc. while this > new implementation gets away without all that. on my i386 pae setup (including swap extension enabled): Before text data bss dec hex filename 310086 29970 35372 375428 5ba84 mm/built-in.o After size mm/built-in.o text data bss dec hex filename 309048 30030 35372 374450 5b6b2 mm/built-in.o I would expect a bigger difference but still good. > Test: Will look into results later. [...] > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> > --- > include/linux/memcontrol.h | 18 +-- > mm/memcontrol.c | 412 ++++---------------------------------------- > mm/vmscan.c | 80 +-------- > 3 files changed, 48 insertions(+), 462 deletions(-) Really nice to see [...] > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 170dff4..d4f7ae5 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c [...] > @@ -1318,6 +1123,36 @@ static unsigned long mem_cgroup_margin(struct mem_cgroup *memcg) > return margin >> PAGE_SHIFT; > } > > +/** > + * mem_cgroup_over_softlimit > + * @root: hierarchy root > + * @memcg: child of @root to test > + * > + * Returns %true if @memcg exceeds its own soft limit or contributes > + * to the soft limit excess of one of its parents up to and including > + * @root. > + */ > +bool mem_cgroup_over_softlimit(struct mem_cgroup *root, > + struct mem_cgroup *memcg) > +{ > + if (mem_cgroup_disabled()) > + return false; > + > + if (!root) > + root = root_mem_cgroup; > + > + for (; memcg; memcg = parent_mem_cgroup(memcg)) { > + /* root_mem_cgroup does not have a soft limit */ > + if (memcg == root_mem_cgroup) > + break; > + if (res_counter_soft_limit_excess(&memcg->res)) > + return true; > + if (memcg == root) > + break; > + } > + return false; > +} Well, this might be little bit tricky. We do not check whether memcg and root are in a hierarchy (in terms of use_hierarchy) relation. If we are under global reclaim then we iterate over all memcgs and so there is no guarantee that there is a hierarchical relation between the given memcg and its parent. While, on the other hand, if we are doing memcg reclaim then we have this guarantee. Why should we punish a group (subtree) which is perfectly under its soft limit just because some other subtree contributes to the common parent's usage and makes it over its limit? Should we check memcg->use_hierarchy here? Does it even makes sense to setup soft limit on a parent group without hierarchies? Well I have to admit that hierarchies makes me headache. > + > int mem_cgroup_swappiness(struct mem_cgroup *memcg) > { > struct cgroup *cgrp = memcg->css.cgroup; [...] > diff --git a/mm/vmscan.c b/mm/vmscan.c > index e3fd8a7..4279549 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2121,8 +2121,16 @@ static void shrink_zone(int priority, struct zone *zone, > .mem_cgroup = memcg, > .zone = zone, > }; > + int epriority = priority; > + /* > + * Put more pressure on hierarchies that exceed their > + * soft limit, to push them back harder than their > + * well-behaving siblings. > + */ > + if (mem_cgroup_over_softlimit(root, memcg)) > + epriority = 0; This sounds too aggressive to me. Shouldn't we just double the pressure or something like that? Previously we always had nr_to_reclaim == SWAP_CLUSTER_MAX when we did memcg reclaim but this is not the case now. For the kswapd we have nr_to_reclaim == ULONG_MAX so we will not break out of the reclaim early and we have to scan a lot. Direct reclaim (shrink or hard limit) shouldn't be affected here. > > - shrink_mem_cgroup_zone(priority, &mz, sc); > + shrink_mem_cgroup_zone(epriority, &mz, sc); > > mem_cgroup_account_reclaim(root, memcg, > sc->nr_reclaimed - nr_reclaimed, -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>