Re: [PATCH 1/4] memcg, mm: introduce lowlimit reclaim

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri 02-05-14 18:00:56, Johannes Weiner wrote:
> On Fri, May 02, 2014 at 06:49:30PM +0200, Michal Hocko wrote:
> > On Fri 02-05-14 11:58:05, Johannes Weiner wrote:
> > > On Fri, May 02, 2014 at 11:36:28AM +0200, Michal Hocko wrote:
> > > > On Wed 30-04-14 18:55:50, Johannes Weiner wrote:
> > > > > On Mon, Apr 28, 2014 at 02:26:42PM +0200, Michal Hocko wrote:
[...]
> > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > > index c1cd99a5074b..0f428158254e 100644
> > > > > > --- a/mm/vmscan.c
> > > > > > +++ b/mm/vmscan.c
> > > > [...]
> > > > > > +static void shrink_zone(struct zone *zone, struct scan_control *sc)
> > > > > > +{
> > > > > > +	if (!__shrink_zone(zone, sc, true)) {
> > > > > > +		/*
> > > > > > +		 * First round of reclaim didn't find anything to reclaim
> > > > > > +		 * because of low limit protection so try again and ignore
> > > > > > +		 * the low limit this time.
> > > > > > +		 */
> > > > > > +		__shrink_zone(zone, sc, false);
> > > > > > +	}
> > > 
> > > So I don't think this can work as it is, because we are not actually
> > > changing priority levels yet. 
> > 
> > __shrink_zone returns with 0 only if the whole hierarchy is is under low
> > limit. This means that they are over-committed and it doesn't make much
> > sense to play with priority. Low limit reclaimability is independent on
> > the priority.
> > 
> > > It will give up on the guarantees of bigger groups way before smaller
> > > groups are even seriously looked at.
> > 
> > How would that happen? Those (smaller) groups would get reclaimed and we
> > wouldn't fallback. Or am I missing your point?
> 
> Lol, I hadn't updated my brain to a394cb8ee632 ("memcg,vmscan: do not
> break out targeted reclaim without reclaimed pages") yet...  Yes, you
> are right.

You made me think about this more and you are right ;).
The code as is doesn't cope with many racing reclaimers when some
threads can fallback to ignore the lowlimit although there are groups to
scan in the hierarchy but they were visited by other reclaimers.
The patch bellow should help with that. What do you think?
I am also thinking we want to add a fallback counter in memory.stat?
---
>From e997b8b4ac724aa29bdeff998d2186ee3c0a97d8 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@xxxxxxx>
Date: Mon, 5 May 2014 15:12:18 +0200
Subject: [PATCH] vmscan: memcg: check whether the low limit should be ignored

Low-limit (aka guarantee) is ignored when there is no group scanned
during the first round of __shink_zone. This approach doesn't work when
multiple reclaimers race and reclaim the same hierarchy (e.g. kswapd
vs. direct reclaim or multiple tasks hitting the hard limit) because
memcg iterator makes sure that multiple reclaimers are interleaved
in the hierarchy. This means that some reclaimers can see 0 scanned
groups although there are groups which are above the low-limit and they
were reclaimed on behalf of other reclaimers. This leads to a premature
low-limit break.

This patch adds mem_cgroup_all_within_guarantee() which will check
whether all the groups in the reclaimed hierarchy are within their low
limit and shrink_zone will allow the fallback reclaim only when that is
true. This alone is still not sufficient however because it would lead
to another problem. If a reclaimer constantly fails to scan anything
because it sees only groups within their guarantees while others do the
reclaim then the reclaim priority would drop down very quickly.
shrink_zone has to be careful to preserve scan at least one group
semantic so __shrink_zone has to be retried until at least one group
is scanned.

Signed-off-by: Michal Hocko <mhocko@xxxxxxx>
---
 include/linux/memcontrol.h |  5 +++++
 mm/memcontrol.c            | 13 +++++++++++++
 mm/vmscan.c                | 17 ++++++++++++-----
 3 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index c00ccc5f70b9..077a777bd9ff 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -94,6 +94,7 @@ bool task_in_mem_cgroup(struct task_struct *task,
 
 extern bool mem_cgroup_within_guarantee(struct mem_cgroup *memcg,
 		struct mem_cgroup *root);
+extern bool mem_cgroup_all_within_guarantee(struct mem_cgroup *root);
 
 extern struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page);
 extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
@@ -296,6 +297,10 @@ static inline bool mem_cgroup_within_guarantee(struct mem_cgroup *memcg,
 {
 	return false;
 }
+static inline bool mem_cgroup_all_within_guarantee(struct mem_cgroup *root)
+{
+	return false;
+}
 
 static inline struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 58982d18f6ea..4fd4784d1548 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2833,6 +2833,19 @@ bool mem_cgroup_within_guarantee(struct mem_cgroup *memcg,
 	return false;
 }
 
+bool mem_cgroup_all_within_guarantee(struct mem_cgroup *root)
+{
+	struct mem_cgroup *iter;
+
+	for_each_mem_cgroup_tree(iter, root)
+		if (!mem_cgroup_within_guarantee(iter, root)) {
+			mem_cgroup_iter_break(root, iter);
+			return false;
+		}
+
+	return true;
+}
+
 struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
 {
 	struct mem_cgroup *memcg = NULL;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5f923999bb79..2686e47f04cc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2293,13 +2293,20 @@ static unsigned __shrink_zone(struct zone *zone, struct scan_control *sc,
 
 static void shrink_zone(struct zone *zone, struct scan_control *sc)
 {
-	if (!__shrink_zone(zone, sc, true)) {
+	bool honor_guarantee = true;
+
+	while (!__shrink_zone(zone, sc, honor_guarantee)) {
 		/*
-		 * First round of reclaim didn't find anything to reclaim
-		 * because of the memory guantees for all memcgs in the
-		 * reclaim target so try again and ignore guarantees this time.
+		 * The previous round of reclaim didn't find anything to scan
+		 * because
+		 * a) the whole reclaimed hierarchy is within guarantee so
+		 *    we fallback to ignore the guarantee because other option
+		 *    would be the OOM
+		 * b) multiple reclaimers are racing and so the first round
+		 *    should be retried
 		 */
-		__shrink_zone(zone, sc, false);
+		if (mem_cgroup_all_within_guarantee(sc->target_mem_cgroup))
+			honor_guarantee = false;
 	}
 }
 
-- 
2.0.0.rc0

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]