ping On Thu 24-04-14 16:27:04, Michal Hocko wrote: > On Thu 24-04-14 08:19:17, Johannes Weiner wrote: > > On Fri, Apr 18, 2014 at 01:36:11PM +0200, Michal Hocko wrote: > > > On Wed 16-04-14 17:13:18, Johannes Weiner wrote: > > > > Per-memcg swappiness and oom killing can currently not be tweaked on a > > > > memcg that is part of a hierarchy, but not the root of that hierarchy. > > > > Users have complained that they can't configure this when they turned > > > > on hierarchy mode. In fact, with hierarchy mode becoming the default, > > > > this restriction disables the tunables entirely. > > > > > > Except when we would handle the first level under root differently, > > > which is ugly. > > > > > > > But there is no good reason for this restriction. > > > > > > I had a patch for this somewhere on the think_more pile. I wasn't > > > particularly happy about the semantic so I haven't posted it. > > > > > > > The settings for > > > > swappiness and OOM killing are taken from whatever memcg whose limit > > > > triggered reclaim and OOM invocation, regardless of its position in > > > > the hierarchy tree. > > > > > > This is OK for the OOM knob because the memory pressure cannot be > > > handled at that level in hierarchy and that is where the OOM happens. > > > > > > I am not so sure about the swappiness though. The swappiness tells us > > > how to proportionally scan anon vs. file LRUs and those are per-memcg, > > > not per-hierarchy (unlike the charge) so it makes sense to use it > > > per-memcg IMO. > > > > > > Besides that using the reclaim target value might be quite confusing. > > > Say, somebody wants to prevent from swapping in a certain group and > > > yet the pages find their way to swap depending on where the reclaim is > > > triggered from. > > > Another thing would be that setting swappiness on an unlimited group has > > > no effect although I would argue it makes some sense in configuration > > > when parent is controlled by somebody else. I would like to tell how > > > to reclaim me when I cannot say how much memory I can have. > > > > > > It is true that we have a different behavior for the global reclaim > > > already but I am not entirely happy about that. Having a different > > > behavior for the global vs. limit reclaims just calls for troubles and > > > should be avoided as much as possible. > > > > > > So let's think what is the best semantic before we merge this. I would > > > be more inclined for using per-memcg swappiness all the time (root using > > > the global knob) for all reclaims. > > > > Yeah, we've always used the triggering group's swappiness value but at > > the same time forced the whole hierarchy to have the same setting as > > the root. > > > > I don't really feel strongly about this. If you prefer the per-memcg > > swappiness I can send a followup patch - or you can. > > OK, I originally thought this would be in the same patch but now that I > think about it some more it would be better to have it separate in case > it turns out this will cause some issues (at least > global_reclaim-always-use-global-vm_swappiness is a behavior change). > So what do you think about this? > --- > From 3a865b7b53aed96d93bbcf865028e63fd6f582ab Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@xxxxxxx> > Date: Thu, 24 Apr 2014 15:28:05 +0200 > Subject: [RFC PATCH] vmscan: memcg: Always use swappiness of the reclaimed memcg > > The memory reclaim always uses swappiness of the reclaim target memcg > (origin of the memory pressure) or vm_swappiness for the global memory > reclaim. This behavior was consistent (except for difference between > global and hard limit reclaim) because swappiness was enforced to be > consistent within each memcg hierarchy. > > After "mm: memcontrol: remove hierarchy restrictions for swappiness > and oom_control" each memcg can have its own swappiness independent on > hierarchical parents, though, so the consistency guarantee is gone. > This can lead to an unexpected behavior. Say that a group is explicitly > configured to not swapout by memory.swappiness=0 but its memory gets > swapped out anyway when the memory pressure comes from its parent with a > different swapping policy. > It is also unexpected that the knob is meaningless without setting the > hard limit which would trigger the reclaim and enforce the swappiness. > There are setups where the hard limit is configured higher in the > hierarchy by an administrator and children groups are under control of > somebody else who is interested in the swapout behavior but not > necessarily about the memory limit. > > From a semantic point of view swappiness is an attribute defining > anon vs. file proportional scanning of LRU which is memcg specific > (unlike charges which are propagated up the hierarchy) so it should be > applied to the particular memcg's LRU regardless where the memory > pressure comes from. > > This patch removes vmscan_swappiness() and stores the swappiness into > the scan_control structure. mem_cgroup_swappiness is then used to > provide the correct value before shrink_lruvec is called. The global > vm_swappiness is used for the root memcg. > > Signed-off-by: Michal Hocko <mhocko@xxxxxxx> > --- > Documentation/cgroups/memory.txt | 15 +++++++-------- > mm/vmscan.c | 18 ++++++++---------- > 2 files changed, 15 insertions(+), 18 deletions(-) > > diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt > index 4937e6fff9b4..b3429aec444c 100644 > --- a/Documentation/cgroups/memory.txt > +++ b/Documentation/cgroups/memory.txt > @@ -540,14 +540,13 @@ Note: > > 5.3 swappiness > > -Similar to /proc/sys/vm/swappiness, but only affecting reclaim that is > -triggered by this cgroup's hard limit. The tunable in the root cgroup > -corresponds to the global swappiness setting. > - > -Please note that unlike the global swappiness, memcg knob set to 0 > -really prevents from any swapping even if there is a swap storage > -available. This might lead to memcg OOM killer if there are no file > -pages to reclaim. > +Overrides /proc/sys/vm/swappiness for the particular group. The tunable > +in the root cgroup corresponds to the global swappiness setting. > + > +Please note that unlike during the global reclaim, limit reclaim > +enforces that 0 swappiness really prevents from any swapping even if > +there is a swap storage available. This might lead to memcg OOM killer > +if there are no file pages to reclaim. > > 5.4 failcnt > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 310e1f67625e..7d2f8226cbd0 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -86,6 +86,9 @@ struct scan_control { > /* Scan (total_size >> priority) pages at once */ > int priority; > > + /* anon vs. file LRUs scanning "ratio" */ > + int swappiness; > + > /* > * The memory cgroup that hit its limit and as a result is the > * primary target of this reclaim invocation. > @@ -1833,13 +1836,6 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan, > return shrink_inactive_list(nr_to_scan, lruvec, sc, lru); > } > > -static int vmscan_swappiness(struct scan_control *sc) > -{ > - if (global_reclaim(sc)) > - return vm_swappiness; > - return mem_cgroup_swappiness(sc->target_mem_cgroup); > -} > - > enum scan_balance { > SCAN_EQUAL, > SCAN_FRACT, > @@ -1900,7 +1896,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, > * using the memory controller's swap limit feature would be > * too expensive. > */ > - if (!global_reclaim(sc) && !vmscan_swappiness(sc)) { > + if (!global_reclaim(sc) && !sc->swappiness) { > scan_balance = SCAN_FILE; > goto out; > } > @@ -1910,7 +1906,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, > * system is close to OOM, scan both anon and file equally > * (unless the swappiness setting disagrees with swapping). > */ > - if (!sc->priority && vmscan_swappiness(sc)) { > + if (!sc->priority && sc->swappiness) { > scan_balance = SCAN_EQUAL; > goto out; > } > @@ -1935,7 +1931,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, > * With swappiness at 100, anonymous and file have the same priority. > * This scanning priority is essentially the inverse of IO cost. > */ > - anon_prio = vmscan_swappiness(sc); > + anon_prio = sc->swappiness; > file_prio = 200 - anon_prio; > > /* > @@ -2221,6 +2217,7 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc) > > lruvec = mem_cgroup_zone_lruvec(zone, memcg); > > + sc->swappiness = mem_cgroup_swappiness(memcg); > shrink_lruvec(lruvec, sc); > > /* > @@ -2678,6 +2675,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *memcg, > .may_swap = !noswap, > .order = 0, > .priority = 0, > + .swappiness = mem_cgroup_swappiness(memcg), > .target_mem_cgroup = memcg, > }; > struct lruvec *lruvec = mem_cgroup_zone_lruvec(zone, memcg); > -- > 1.9.2 > > -- > Michal Hocko > SUSE Labs > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>